Protein Feature Classification Using Particle Swarm Optimization and Artificial Neural Networks Bithin Kanti Shee
National Institute of Technology,Rourkela Rourkela, India
Swati Vipsita
National Institute of Technology,Rourkela Rourkela, India
[email protected] [email protected] ABSTRACT A protein superfamily consists of proteins which share amino acid sequence homology and are therefore functionally and structurally related. Protein classification focuses on predicting the function or the structure of new proteins. This can be done by classifying a new protein to a given family with previously known characteristics. Artificial neural networks have been successfully applied to problems in pattern classification, function approximation,and associative memories. The traditional Backpropagation (BP) algorithm is generally used to train multilayer feedforward network but they are limited to search for a suitable set of weights in an apriori fixed network topology. This mandates the selection of an appropriate optimized synaptic weight for the learning problem in hand.Particle Swarm Optimization (PSO) is a population based stochastic optimization technique which is very effective in solving real valued global optimization problems.Thus, a hybrid method combining PSO-BP is implemented in this paper.PSO has the limitation of getting trapped in local minima.So, mutation of few particles are done based on probability of mutation and thus, a modified PSO is implemented.The main objective of the paper is to develop an efficient classifier using feedforward neural network.The efficiency is measured in terms of speed, predictive accuracy, sensitivity, and specificity.
Keywords Local Minima, Predictive Accuracy, Sensitivity, Specificity.
1.
INTRODUCTION
Classification, or supervised learning, is one of themajor data mining processes. Pattern classification aims to build a function that maps the input feature space to anoutput space of two or more than two classes. Neural Networks (NN) are an effective tool in the field of pattern recognition[2]. In the training stage (Approximation),neural networks extract the features of the input data. In the recognizing stage (Generalization), the network distinguishes
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 2010 ACM X-XXXXX-XX-X/XX/XX ...$10.00.
Santanu Ku. Rath National Institute of Technology,Rourkela Rourkela, India
[email protected] the pattern of the input data by the features,and the result of recognition is greatly influenced by the hidden layer[9]. Every instance in any dataset used by machine learning algorithms is represented using the same set of features. The features may be continuous, real coded,categorical or binary. If instances are given with known labels (the corresponding correct outputs) then the learning is called supervised, in contrast to unsupervised learning, where instances are unlabeled. Learning usually accomplished by adjustment and modification of the connected synaptic weights. The goal of the network is to learn or to discover some association between input and output patterns, or to analyze, or to find the structure of the input patterns. The input data sets are divided into two sets: Training set and the Test set. Once, the neural network is trained by the instances of the training set, it is then tested by the instances of the test set whether the neural network correctly predicts the pattern or not. The popular BLAST tool (Altshul et al., 1990) represents the simplest nearest neighbor approach and exploits pairwise local alignments to measure sequence similarity. Another type of direct modeling methods is based on hidden Markov models (HMMs)(Durbin et al., 1998; Karplus et al., 1998). After constructing an HMM for each family,protein queries can be easily scored against all established HMMs by calculating the loglikelihood of each model for the unknown sequence and then selecting the class label of the most likely model. The Motif Alignment andSearch Tool (MAST) (Bailey and Gribskov, 1998) is based on the combination of multiple motif-based statistical score values. According to this scheme, groups of probabilistic motifs discovered by the MEME algorithm (Bailey and Elkan, 1994), are used to construct protein profiles for the families of interest. As the size of the protein databases are becoming larger , it is better to develop an intelligent system to classify the protein with high accuracy. Neural networks have been chosen as technical tools for the protein sequence classification task because: 1) the extracted features of the protein sequences are distributed in a high dimensional space with complex characteristics which is difficult to satisfactorily model using some parameterized approaches; and 2) the rules produced by decision tree techniques are complex and difficult to understand because the features are extracted from long character strings [16].
2.
APPLICATION OF PROTEIN FEATURE CLASSIFICATION USING ARTIFICIAL NEURAL NETWORKS
3. Proteins (also known as polypeptides) are organic compounds made of amino acids arranged in a linear chain and folded into a globular form. The amino acids in a polymer are joined together by the peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. In general, the genetic code specifies 20 standard amino acids. Shortly after or even during synthesis, the residues in a protein are often chemically modified by post-translational modification, which alters the physical and chemical properties, folding, stability, activity, and ultimately, the function of the proteins. Proteins can also work together to achieve a particular function, and they often associate to form stable complexes. A protein super-family consists of proteins which share amino acid sequence homology and which may therefore be functionally and structurally related. Traditionally, two protein sequences are classified into the same class if they have high homology i.e have most of the features in common [6]. The aim of classification is to predict target classes for given input proteins. Protein classification focuses on predicting the function or the structure of new proteins.
2.1
Feature extraction technique
The goal of feature extraction is to characterize an object to be recognized by measurements whose values are very similar for objects in the same category., and very different for objects in different categories. This leads to the idea of seeking distinguishing features that are invariant to irrelevant transformations of the input [3]. Sometimes after the features are extracted many features are irrelevant or redundant to the target concept. Removal of irrelevant features drastically reduces the running time of the learning algorithm [11]. The real valued input matrix can be constructed by deriving feature values from the protein sequences such as: the isoelectric point, molecular weight, atomic composition and the length of amino acid. These vectors are given as input and are the training sets for the Neural Network to identify the family membership of the unknown protein sequence.
PARTICLE SWARM OPTIMIZATION(PSO)
PSO is a population based stochastic optimization technique developed by Eberhart and Kennedy in 1995 [7]inspired by social behavior of birds flocking or fish schooling. PSO is designed and proved to be very effective in solving real valued global optimization problems. In pso, population is called swarm and individuals(i.e. the points) are called particles. Each particle moves with an adaptable velocity within search space and retains in a memory the best position ever encountered.This best position is shared with other particles in the swarm after each iteration.Two variants of the PSO algorithm were developed ,one with global neighborhood, and other with a local neighborhood. According to the global variant, each particle moves towards its best previous position and towards the best paticle in its restricted neighborhood [10].Generally speaking, the PSO algorithm has a strong ability to find the most optimistic result, but it has a disadvantage of easily getting into a local optimum .After suitably modulating the parameters for the PSO algorithm, the rate of convergence can be speeded up and the ability to find the global optimistic result can be enhanced [19]. In general, the global variant of PSO exhibits faster convergence rates, although, in some cases it may reduce the swarm’s diversity very fast, thereby getting trapped in local minimizers. On other hand, the local variant, exhibits superior exploration capabilities at the cost of slower convergence. The basic concept of PSO technique lies in accelerating each particle towards its pbest and the gbest locations t each time step.Acceleration has random weights for both pbest and gbest locations [4].
∙ Atomic composition: Count the Carbon, Hydrogen, Nitrogen, Oxygen and Sulphur atoms in the sequence. ∙ Molecular weight:Mass of a molecule of a substance, based on 12 as the atomic weight of carbon-12. It is calculated in practice by summing the atomic weights of the atoms making up the substance’s molecular formula. ∙ Isoelectric point:The isoelectric point (pI) is the pH at which a particular molecule or surface carries no net electrical charge. The net charge on the molecule is affected by pH of their surrounding environment and can become more positively or negatively charged due to the loss or gain of protons (𝐻 + ). At a pH below their pI, proteins carry a net positive charge; above their pI they carry a net negative charge. Proteins can thus be separated according to their isoelectric point (overall charge). ∙ Length of amino acid sequence: There are twenty standard amino acid bases for a protein sequence. The individual frequency of amino acid bases gives the length of the protein sequence.
Figure 1: Concept of changing particle position in PSO
3.1
PSO Algorithm
1. Initialize a population (array) of particles with random positions and velocities of d dimensions in the problem space. 2. For each particle, evaluate the desired optimization fitness function in d variables. 3. If the fitness value is better than the best fitness value (𝑃𝐵 𝑒𝑠𝑡) in history set current value as the new 𝑃𝐵 𝑒𝑠𝑡. 4. Compare fitness evaluation with the population’s overall previous best. It the current value is better than 𝐺𝑏 𝑒𝑠𝑡, then reset gbesr to the current particle’s array index and value.
5. Change the velocity and position of the particle according to equations (1) and (2) respectively. 𝑉𝑖 𝑑 and 𝑋𝑖 𝑑 represent the velocity and position of 𝑖𝑡ℎ particle with d dimensions respectively and, rand, and rand2 are two uniform random functions. dimensions respectively and, rand1, and rand2 are two uniform random functions. 𝑉𝑖 𝑑 = 𝑊 ∗ 𝑉𝑖 𝑑 + 𝑐1 ∗ 𝑟𝑎𝑛𝑑1 ∗ (𝑃𝐵 𝑒𝑠𝑡𝑖𝑑 − 𝑋𝑖 𝑑) +𝑐2 ∗ 𝑟𝑎𝑛𝑑2 ∗ (𝐺𝑏 𝑒𝑠𝑡𝑖𝑑 − 𝑋𝑖 𝑑) 𝑋𝑖 𝑑 = 𝑋𝑖 𝑑 + 𝑉𝑖 𝑑
(1) (2)
6. Repeat step (2) until a criterion is met, usually a sufficiently good fitness or a maximum number of Iterations or epochs. PSO has many parameters and these are described as follows: W called the inertia weight controls the exploration and exploitation of the search space because it dynamically adjusts velocity. Local minima are avoided by small local neighborhood, but faster convergence is obtained by larger global neighborhood and in general, global neighborhood is preferred. Synchronous updates are more costly than the asynchronous updates. 𝑉𝑚 𝑎𝑥 is the maximum allowable velocity for the particles , i.e. in case the velocity of the particle exceeds 𝑉𝑚 𝑎𝑥 then it is reduced to 𝑉𝑚 𝑎𝑥. Thus, resolution and fitness of search depends on 𝑉𝑚 𝑎𝑥. If 𝑉𝑚 𝑎𝑥 is too high, then particles will move beyond good solution and if 𝑉𝑚 𝑎𝑥 is too low, then particles will be trapped in local minima. C1, C2 are termed as cognition aand social components respectively are the acceleration constants which changes the velocity of a pparticle towards 𝑝𝑏 𝑒𝑠𝑡 and 𝑔𝑏 𝑒𝑠𝑡,(generally somewhere between 𝑝𝑏 𝑒𝑠𝑡 and 𝑔𝑏 𝑒𝑠𝑡).
3.2
PSO with Mutation -Modified PSO In general, the global variant of the PSO exhibits faster convergence rates , although, in some cases it may reduce the swarm’s diversity very fast , thereby getting trapped in local minima. To overcome from the local minima, mutation of few particles are done based on the probability of mutation . The particles which undergo mutation are chosen randomly and few changes to the particle value is done. 4.
DESCRIPTION OF ALGORITHMS
While implementing the training algorithms we have two choices: 1. Pattern mode training : We can present a single pattern ,compute the gradients, and proceed to change the network weights based on these instantaneous local gradient values. This is called pattern mode training. Given Q training patterns (Xi ,Di )Qi=1 and some initial network 𝑁0 ,pattern mode training generates a sequence of Q neural networks 𝑁1 ......, 𝑁𝑄 over one epoch of training. 2. Batch mode training: Alternatively, one could collect the error gradients over an entire epoch and then change the weights of the initial neural network N0 in one shot. This is true gradient descent-where gradient vectors are collected over the entire training set and
the weight change is decided using the single resultant global gradient vector. This is called batch mode training. In all the three algorithms implemented, the training of neural network is done using batch mode training.
4.1
Feedforward Neural Network with Backpropagation(BP)algorithm :
Neurons are organized into layers. The input layer is composed not of full neurons, but rather consists simply of the values in a data record, that constitutes inputs to the next layer of neurons. The next layer is called a hidden layer; there may be several hidden layers. The final layer is the output layer, where there is one node for each class. A single sweep forward through the network results in the assignment of a value to each output node, and the record is assigned to whichever class’s node had the highest value. Multilayer
Figure 2: Architecture of Feedforward Neural Network feedforward networks are trained using the Backpropagation (BP) learning algorithm . Back-propagation training algorithm when applied to a feed-forward multi-layer neural network is known as Back-propagation neural network. Functional signals flows in forward direction and error signals propagate in backward direction. That’s why it is Error Back-propagation or shortly backpropagation network. The activation function that can be differentiated (such as sigmoidal activation function) is chosen for hidden and output layer computational neurons. The algorithm is based on error-correction rule. The rule for changing values of synaptic weights follows generalized delta rule [12].
4.2
PSO-BP hybrid classifier
Although Backpropagation is the most popular learning method in the neural network community, two drawbacks of it are often pointed out: the very slow computing speed and the possibility of getting trapped in local minima. While many computational techniques have been proposed to overcome the first problem, relatively little attention has been paid to the second problem. PSO-BP encodes the parameters of NNs as particlesand the population of particles are referred as Swarm.The bias neuron is not included in the encoding of the particles.Here,the synaptic weights oof the neural network are initialized as particles and the PSO is applied to obtain the optimized set of synaptic weights. Fitness function (Objective function): It constitutes the measure of fitness (adaptation) of a given particle in the population. The fitness function is usually an optimized ffunction (strictly speaking, a maximization or minimization function) called the objective ffunction. Here, the fitness function will be to minimize error function. The primary index to weigh BP neural network’s performance is MSE (Mean Squared Error) between network’s factual output and its eexpected
output. Low value of MSE means the performance of network is fine. The fitness function may be defined as follows: 𝐹 = 1/(1 ∑ + 𝑅) Where, 𝑅 = 1/𝑙 ( 𝑖 = 1)𝑙 (𝑒(𝑖))2 𝑒(𝑖) = 𝑦(𝑖)−𝑦𝑚 (𝑖), l denotes the number of training samples . y(i)denotes network’s factual output, 𝑦𝑚 (𝑖)denotes network’s expected output, e(i)denotes the error between network’s factual output and its expected output.
5.
MEASURES USED TO ESTIMATE THE PERFORMANCE OF NEURAL NETWORK AS CLASSIFIER :
The measure used to evaluate the performance of the NN classifier: 1. Precision=(TP+TN) / (TP+FP+TN+FN) 2. Sensitivity = (TP/TP+FN)*100% 3. Specificity = (TN/ TN+FP) * 100% Where TP=true positive TN=true negative FP=false positive FN=false negative.
6. INPUT SOURCE, SIMULATION RESULTS AND GRAPHS The simulation process is carried on the computer having Intel(R) core (TM) 2 Duo processor with speed 3.0 GHz and 3 GB of RAM. The MATLAB version used is 7.5. The simulation was first carried out with iris data set and then protein was taken as input. Protein data set : Uniprot(Universal protein resource) : http://www.uniprot.org/ Three super family to be analyzed: ∙ esterase(195) ∙ lipase(155) ∙ cytochrome(140) ∙ Number of Instances: 450 (150 in each of three classes: 100 for training set and rest 50 for test set) ∙ Number of Attributes: 4 (molecular weight, atomic composition, isoelectric point, length of protein sequence)
Figure 4: Accuracy, Sensitivity and Specificity in case of BP
Figure 3: Flowchart of PSO-BP algorithm
4.3
MPSO-BP hybrid classifier
The probability of mutation is considered as 0.05 and the randomly generated particles undergo mutation. The training process is same as the PSO-BP, but a mutation phase is incorporated just before one generation ends. The algorithm terminates after some fixed number of generations.
Figure 5: Accuracy and Fitness in PSO-BP
forms PSO by the addition of probability of muation in the swarm of particles, thereby diversifying the search space and prevents the solution fron getting trapped from local minima. The work can be further extended by implenting Gaussian PSO and Differential Evolution PSO hybridized with BP and the predictive accuracy, sensitivity, and specificity can be observed.
Figure 6: Accuracy and Fitness in MPSO-BP Algo. used
No. of epochs/ Gen
Max. Fitness
Max. Accr.
BP PSO-BP MPSO-BP
10000 500 500
– 10.99 8.99
92 96 97.33
Avg. Accr. of 10 runs – 91.73 94.83
Table 1: The performance of BP, PSO-BP and MPSO-BP
Table :2 Sensitivity and Specificity in case of BP, PSO-BP and MPSO-BP
7.
RESULTS AND DISCUSSIONS
The controlling parameters for feedforward BP algorithm are the learning rate (𝜂) and momentum (𝛼). These two parameters are varied within a range from 0.1-0.9.The simulations are carried out for all the combinations of 𝛼 and 𝜂 and then the maximum predictive accuracy is recorded , which was found to be 92% when 𝛼 = 0.2 and 𝜂 = 0.9 . In case of PSO-BP and MPSO-BP , the algorithm was run 10 times by varying the inertia weight W from 0.2 to 1.2 and assuming the maximum velocity 𝑣𝑚 𝑎𝑥 = 4 .It was observed in both the cases , the optimum value of inertia weight W lied between 0.6 to 0.8. The maximum accuracy obtained within this range are 96% and 97.33%, and maximum fitness recorded were 10.99 and 12.89 in PSO-BP and MPSO-BP respectively. Keeping w=0.7 (constant) and varying 𝑣𝑚 𝑎𝑥 from 1 to 10 , it was observed that at 𝑣𝑚 𝑎𝑥=4, the best fitness was found 11.01 and accuracy 94.66% in case of PSOBP. At 𝑣𝑚 𝑎𝑥 =3, MPSO gave good result and fitness was found to be 12.53 and accuracy was 96%.
8.
CONCLUSION AND SCOPE OF FUTURE WORK
The protein super family classification problem, which consists of determining the super family membership of a given unknown protein sequence, is very important for a biologist for many practical reasons, such as drug discovery, prediction of molecular function and medical diagnosis.The clssification is done using feedforward neural network and the MPSO-BP training algorithm gives better result in comparison to BP and MPSO-BP algorithm. MPSO out per-
9.
REFERENCES
[1] S. Bandyopadhyay. An efficient technique for superfamily classification of amino acid sequences: feature extraction, fuzzy clustering and prototype selection. Fuzzy Sets Syst., 152(1):5–16, 2005. [2] C. M. Bishop. Neural networks for pattern recognition. Oxford, 1995. [3] R. O. Duda, P. E. Hart, and D. G.Stork. Pattern Classification. Wiley Interscience Publication, 2 edition, 2001. [4] V. G. Gudise, G. K. Venayagamoorthy, and S.-M. /eee. Comparison of particle swarm optimization and backpropagation as training algorithms for neural networks. In in Proceedings of the IEEE Swarm Intelligence Symposium 2003 (SIS 2003, pages 110–117, 2003. [5] S. Haykin. Neural Networks-A Comprehensive Foundation. Pearson Prentice Hall, 2 edition, 2009. [6] I. Jonassen. Methods for finding motifs in sets of related biosequences. PhD thesis, Department of Informatics, University of Bergen, 1996. [7] J. Kennedy and R. Eberhart. Particle swarm optimization. In Neural Networks, 1995. Proceedings., IEEE International Conference on, volume 4, pages 1942 –1948 vol.4, 1995. [8] S. B. Kotsiantis. Supervised machine learning: A review of classification techniques. Informatica, 31(3):249–268, 2007. [9] S. Kumar. Neural Networks -A Classroom Approach. Tata McGraw-Hill. [10] K. E. Parsopoulos and M. N. Vrahatis. Recent approaches to global optimization problems through particle swarm optimization. Natural Computing: an international journal, 1(2-3):235–306, 2002. [11] P. V. N. Rao, T. U. Devi, G. R. Sridhar, and A. A. Rao. A probabilistic neural network approach for protein superfamilt classification. Journal of Theoretical and Applied Information Information Technology, pages 101 – 105, 2005. [12] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. pages 318–362, 1986. [13] J. P. M. D. Sa. Pattern recognition, concepts, methods and applications. Springer - Verlag, Berlin Heidelberg, New York, 2001. [14] F. van den Bergh. Particle swarm weight initialization in multi-layer perceptron artificial neural networks. In Proceedings of the 1991 International Conference on Artificial Neural Networks, ICANN-91, pages 365–370. Elsevier Publishers B.V., North-Holland, 1999. [15] F. van den Bergh, A. P. Engelbrecht, and A. P. Engelbrecht. Cooperative learning in neural networks
[16]
[17]
[18] [19]
using particle swarm optimizers. South African Computer Journal, 26:84–90, 2000. D. Wang and G. B. Huang. Protein sequence classification using extreme learning machine. In Proceedings of International Joint Conference on Neural Networks(IJCNN,2005), Montreal, Canada, 2005. D. Wang, N. K. Lee, T. S. Dillon, and N. J. Hoogenraad. Protein sequences classification using modular rbf neural networks. In AI ’02: Proceedings of the 15th Australian Joint Conference on Artificial Intelligence, pages 477–486, London, UK, 2002. Springer-Verlag. P. J. Werbos. The rooots of backpropagation. New York: Wiley, 1994. J.-R. Zhang, J. Z. T.-M. Lok, and M. R. Lyu. A ˝ hybrid particle swarm optimizationUback-propagation algorithm for feedforward neural network training. Applied Mathematics and Computation ,Elsevier 185 ˝ (2007), 185:1026U1037, 2007.