Biclustering of DNA Microarray Data Using Artificial

Report 0 Downloads 150 Views
Biclustering of DNA Microarray Data Using Artificial Immune System Waleed Abohamad Mohammed Korayem Khaled Moustafa Faculty of Computers and Information Faculty Of Computers and Information Faculty of Computers and Information Cairo University, Egypt Fayoum University, Egypt Cairo University, Egypt e-mail: [email protected] e-mail: [email protected] e-mail: [email protected]

Abstract—Advances in DNA microarray technology has motivated the research community to introduce sophisticated techniques for analyzing the resulted large-scale datasets. Biclustering techniques have been widely adapted for analyzing microarray gene expression data due to its ability to extract local patterns with a subset of genes that are similarly expressed over a subset of samples. Mostly, biclustering methods are based on greedy heuristics which often result in suboptimal solutions. To this end, this paper presents a clonal selection algorithm for biclustering (Bic-CSA) that incorporates these greedy searching procedures as local search heuristics in an immune-inspired algorithm. The quality of biclusters has been demonstrated by experimentation on a well known benchmark dataset. Moreover, the performance of Bic-CSA is compared with other related local search-based methods and immune inspired algorithms. It is shown from results and comparative study that the proposed algorithm outperforms other algorithms in terms of bicluster size and mean-squared residue. Keywords-Artificial Immune System; Biclustering; Clonal Selection Algorithm; DNA Microarray; Gene Expression Data; Gene Ontology;

I. I NTRODUCTION DNA Microarray technology allows simultaneous measuring of the expression level of a great number of genes within a number of different experimental samples [1]. Expression levels of such genes are thought to be correlated with the amount of corresponding synthetic protein. Consequently, analyzing gene expression datasets may lead to discovering patterns which relate to protein function prediction, disease diagnosis, and drug discovery. Gene expression data is arranged into a matrix where, rows represent genes and columns represent the samples. Each element in the matrix represents the expression level of a gene under a specific condition and it is represented by a real number. Clustering approach represents the most common analysis technique used for gene expression data. With respect to clustering, the main goal is to identify genes according to their function similarities [2]. Standard clustering methods such as k-means, hierarchical clustering [3], and self organizing maps [4] assume that related genes have similar expression patterns across all samples and hence divide the set of genes into disjoint groups. So, identifying local patterns with subset of genes that are similarly expressed over a subset of samples is difficult using

traditional clustering techniques. To better reflect biological reality, biclustering approach can be applied which is based on grouping similarly expressed genes set over a subset of samples simultaneously. The term biclustering was first used by Cheng and Church [5] in gene expression data analysis where they introduced the residue of an element in the bicluster and the mean squared residue of a submatrix. Getz et al. [6] presented a coupled two-way clustering approach which applied hierarchical clustering separately to each dimension and then both results are combined. Yang et al. [7] presented delta clusters, and a year later, they improved Cheng and Church’s approach in FLOC (Flexible Overlapping Clusters) [8]. A graph-based model combined with greedy approach has been used in Tanay et al. [9] to identify biclusters. Liu and Wang [10] proposed an exhaustive bicluster enumeration algorithm. Aforementioned methods are mostly based on greedy heuristics, however, some heuristic approaches have been used to find biclusters in DNA microarray data. Bleuler et al. [11] presented a framework that incorporates local search procedures within evolutionary algorithms. Federico and Ruiz [12] proposed a sequential evolutionary biclustering method (SEBI) that searches for biclusters following a sequential covering strategy. Banka and Mitra [13] presented a multi-objective evolutionary algorithm (MOEA) for biclustering gene expression data which efficiently discover global optimal solutions. Recently, artificial immune system (AIS) has been introduced to solve the biclustering problem [14]. An immune-inspired biclustering algorithm known as BIC-aiNet (Artificial immune Network for Biclustering) is developed in [15] for text mining which efficiently group extract implicit useful information from text based on its similarlarity. BIC-aiNet has also been applied for other applications such as collaborative filtering [16] and query expansion [17]. Biclustering has been combined with the multi-population of aiNet and the concept of multi-objective in [18]. The efficiency of the resulted framework, known as MOM-aiNet (Multi-Objective Multipopulation Artificial Immune Network), has been tested on a collaborative filtering application and a gene expression problem. Based on the immune response principle, Liu [19] proposed a multiobjective immune biclustering (MOIB) algorithm to mine biclusters from DNA microarray data.

Clonal Selection algorithm (CSA) is a special kind of AIS algorithm that uses clonal expansion and affinity maturation concepts as the main forces of the evolutionary process [20]. Based on the clonal expansion principle and local search procedures, this paper proposes a hybrid clonal selection algorithm for biclustering (Bic-CSA) for mining gene expression data. The proposed approach seeks to find biologically significant biclusters with maximum size and mean squared residue lower than a given threshold δ. II. C LONAL S ELECTION A LGORITHM FOR B ICLUSTERING A. Notations Gene expression data is represented in an M by N matrix, with M rows and N columns, where element xij represents the expression level of gene i under sample j. Such a matrix, is defined by its set of rows G = {g1 , ..., gM } and columns S = {s1 , ..., sN }. As described in [19], binary Hamming shape-space will be used to encode each antibody presenting a bicluster. Hence, a bicluster is represented by a fixed sized binary string of length M + N with a bit string for genes appended by another bit string for samples. When a bit i has a value of 1 this means the corresponding gene and/or sample is included in the bicluster. B. Clonal Selection Algorithm Randomly fixed-length binary strings for L individuals are first generated to build up the initial population P . Each string represents a bicluster and the values at each position in the string are coded as either presence or absence of a particular gene and/or sample. Bottom up approach is followed in which smaller biclusters are initially generated and then enlarged through successive generations. A top down approach (i.e. starting with large biclusters) can also be used to initialize the population. However, because local search procedures (described in sub-section D) are incorporated into the CSA framework, the top down approach can increase the convergence rate of the proposed algorithm. Following initialization, biclusters affinity is calculated (i.e., how well a bicluster survives over a pre-specified evaluation criteria). The clonal selection principle is then applied on the initial population in which the concentration of antibodies (individuals) with high affinity is increased in a process known as Cloning. The n highest affinity antibodies from the available antibody repertoire (population) were selected to be cloned (reproduced) independently. The number of clones generated for each of the n antibodies is assumed to be the same, this is because the coexistence of multiple biclusters in the same population that may have high affinity (i.e. more than one optimal solution). So that the number of clones generated for all these n selected antibodies is given by: Nc = nβ (1)

where Nc is the total number of clones generated and β is a multiplying factor represents the number of clones for each selected antibody. The reproduced (cloned) antibodies are then mutated with a rate that is inversely proportional to their affinity: the higher the affinity, the smaller the mutation rate. Random bits are chosen to be flipped, flipping here is done by turning on the selected bits (i.e. bits with 0 become 1). The mutation is done in this manner because we begin with small biclusters and have in mind the objective to increase their size. This process is called somatic hypermutation which allows the algorithm to explore local areas around a specific antibody by making small steps towards an antibody with higher affinity. The affinity of the mutated clones is then calculated, and the n highest affinity mutated clones are selected and inserted in the new repertoire instead of the n lowest affinity antibodies. Hypermutation combined with clonal expansion is an adaptive process known as affinity maturation [21]. Maintaining multiple potential solutions is desirable as multiple antibodies (biclusters) can give high affinity. This can be accomplished by editing similar antibodies (self-reactive receptors) in a process known as receptor editing [22], [23]. The receptor editing process in the proposed algorithm is accomplished by first creating a pool of distinct antibodies and then adding entirely newcomers to this pool to replace low affinity antibodies. The distinct antibodies in the pool are created such that the degree of overlap (i.e. the Hamming distance) between any two antibodies is greater than a threshold . The Hamming distance between any two antibodies abi and abj is given by: ( M +N X 1 if abik 6= abjk D(abi , abj ) = ωk , where ωk = 0 otherwise k=1 (2) where abi and abj are two antibodies in P , and abik and abjk represents the bit k in the antibodies i and j respectively. Receptor editing offers the ability to escape from unsatisfactory local optima. Also, adding a fraction of newcomer antibodies to the pool allows the diversity of the population and broader search for global optimum. Somatic hypermutation and receptor editing balance the exploitation of the best solutions with the exploration of the search space. C. Affinity Calculation A bicluster is defined as a matrix g x s, denoted as (g, s) where g and s are a set of rows (gens) and columns (samples), respectively, and |g| < M and |s| < N . The size of a bicluster (g, s) is defined as the number of cells xij such that i ∈ g and j ∈ s: f (g, s) = |g| × |s|

(3)

Biclusters are required to be composed of a large number of genes and samples in order to be functional and to allow

more deeper analysis. However, biclusters have to follow a given coherency metric [5] such that the mean-square residue (homogeneity) of extracted biclusters are minimized. The homogeneity of the bicluster (g x s) is expressed as a mean squared residue score: H(g, s) =

X 1 (xij − xis − xgj + xgs ) |g| × |s| i∈g,j∈s

(4)

where xis =

1 X xij |s| j∈s

(5)

is the mean row expression value for (g, s), xgj =

1 X xij |g| i∈g

(6)

is the column expression value for (g, s), and xgs =

X 1 xij |g| × |s| i∈g,j∈s

(7)

is the mean expression value over all cells contained in the bicluster (g, s). Therefore, mining biclusters from gene expression data has two conflicting objectives, where the size of biclusters should be maximized while minimizing the mean squared residue score. These two objectives are conflicting because the higher the bicluster size, the lower the coherency degree between bicluster elements. Formally, an aggregate weighting function F is used for these two conflicting objectives such that: F = wf1 + (1 − w)f2

(8)

where 0 ≤ w ≤ 1, ( f1 =

H/δ 0

if H ≤ δ otherwise

(9)

and |g| × |s| (10) M ×N Therefore, f1 is maximized as long as the mean-squared residue is below a user defined threshold δ while f2 is always maximized. For each antibody, F is calculated as its affinity which incorporates the two objectives. f2 =

D. Local Search Procedures When a bicluster violates the homogeneity condition, this may be due to the existence of irrelevant genes and/or samples. These genes and samples should be removed from the bicluster. Furthermore, some genes and/or samples are needed to be incorporated in the bicluster to improve it in terms of homogeneity and size. Local search strategies [5]

can then be used to add/remove multiple genes and/or samples as follows: 1. Multiple genes/samples deletion. a. Calculate xis , xgj , xgs and the bicluster homogeneity H(g, s) using (4)-(7). b. Delete all genes i ∈ g from the bicluster such that: 1 X (xij −xis −xgj +xgs )2 > αH(g, s) (11) |s| i∈g c. Recalculate xis , xgj , xgs and H(g, s). d. Delete all samples j ∈ s such that: 1 X (xij −xis −xgj +xgs )2 > αH(g, s) (12) |g| j∈s 2. Single gene/sample deletion. a. Recalculate xis , xgj , xgs and H(g, s) of the resulted bicluster from previous step. b. Remove genes with largest mean-squared residue: 1 X d(i) = (xij − xis − xgj + xgs )2 (13) |s| i∈g c. Remove samples with largest mean-squared residue: 1 X d(j) = (xij − xis − xgj + xgs )2 (14) |g| j∈s d. Repeat b and c one at a time until H(g, s) ≤ δ. 3. Multiple genes/samples addition. a. Recalculate xis , xgj , xgs and H(g, s) of the resulted bicluster from step 2. b. Add all genes i ∈ / g such that d(i) ≤ H(g, s). c. Recalculate xis , xgj , xgs and H(g, s). d. Add all samples j ∈ / s such that d(j) ≤ H(g, s). Subsequently, an updating strategy is used such that the original bicluster is replaced by the modified one after applying local search in a process known as Lamarckian Evolution which has been proven to be efficient [11]. E. Bic-CSA Algorithm 1. P ← Generate L individuals (initial population) of different size. 2. F ← Evaluate initial population P . 3. Pn ← Select the n highest affinity antibodies such that the Hamming distance between any two antibodies is greater than 1 . 4. CLc ← Reproduce (clone) antibodies in Pn independently with the same clone number β. ∗ 5. CLc ← Apply affinity maturation process to antibodies in CLc . 6. Pn∗ ← Re-select the n highest affinity antibodies from ∗ CLc .

7. P ∗ ← Replace the lowest n affinity antibodies in P by Pn∗ . 8. P ∗∗ ← Select antibodies from P ∗ such that the Hamming distance (i.e. overlapping degree) between any two antibodies is greater than 2 (receptor editing). 9. P ← Generate new antibodies by selecting genes and samples which are not yet concluded in P ∗∗ . New antibodies are generated such that the total number of antibodies is L. 10. Apply local search for all antibodies in the population with condition that their homogeneity values are greater than the threshold‘δ. 11. Repeat steps 2-10 until certain criterion is reached. III. E XPERIMENTAL R ESULTS A. Dataset Literature availability on the performance of related algorithms on Yeast microarray [24] has prompted the selection of this benchmark dataset for implementing the proposed algorithm. Yeast gene expression data is a collection of 2884 genes and 17 samples with integer entries ranging from 0 to 600. The dataset contains 34 missing values which have been replaced by sampling a random number from a uniform distribution between 0 to 800 according to [5]. B. Results For providing fair comparisons with related algorithms, the value of the threshold δ is set to 300 as in [5], [8], and [13]. The population size L is chosen to be 100, n = 20, and β = 10. In order to lead to the best set of biclusters, the parameters 1 , 2 , and w have been selected experimentally as 0.005, 0.01, and 0.5 respectively. For different values of α, Table I summarizes the average results obtained by the proposed algorithm over 10 independent runs with 20 generations for each run. C. Comparative Study The quality of generated biclusters by the proposed algorithm has been compared with two types of related algorithms; algorithms that incorporate local search procedures in their framework and other immune-inspired algorithms. First, the performance of the proposed algorithm (BicCSA) is compared with three related local search based Table I T HE AVERAGE R ESULTS O F B ICLUSTERS O BTAINED B Y B IC -CSA O N Y EAST DATA . α

Avg. # of genes

Avg. # of samples

Avg. Residue

1.1 1.2 1.3 1.4 1.5

1371.00 ± 03.56 1343.80 ± 56.54 1256.20 ± 42.20 1271.20 ± 61.14 1346.80 ± 08.77

12.00 ± 0.00 12.20 ± 0.42 12.90 ± 0.32 12.80 ± 0.42 12.00 ± 0.00

298.11 ± 0.29 297.45 ± 1.26 298.38 ± 0.41 298.31 ± 0.31 299.79 ± 0.07

algorithms: Cheng & Church (CC) [5], Flexible Overlapping Clusters (FLOC) [8], and Multi-Objective Evolution Algorithm (MOEA) [13]. The values of the parameters δ and α were set for all algorithms to 300 and 1.2 respectively. For MOEA, the population size was set to 50 and the probability of crossover and mutation were 0.75 and 0.03 respectively. The results of CC and FLOC come from [8] and results of MOEA are taken from [13]. The comparison results are shown in Table II in terms of average residue and bicluster size. Cheng and Church (CC) [5] employ a set of heuristic algorithms (i.e. local search procedures) to find one or more biclusters by iteratively identifying one bicluster at a time. Discovered biclusters are then masked by random values which results in a random interference phenomena that impacts the discovery of large biclusters. On the other hand, FLOC uses a probabilistic approach to simultaneously discover a set of possibly overlapping biclusters. Initial biclusters are chosen randomly while iterative additions and/or deletions of nodes (i.e. genes or samples) are performed with a goal of minimizing the average residue minimization. Based on evolutionary algorithms, the multiobjective approach (MOEA) was able to generate biclusters with significant size and, at the same time, residues close to the threshold. This is due to the fact that evolutionary algorithms involve a set of evolutionary operators, like selection, crossover and mutation which are applied to a population of chromosomes (biclusters) in an iterative and evolutionary manner. However, results indicate a better performance of Bic-CSA in terms of larger bicluster size, while satisfying the mean-squared residue constraint. This can be explained by the distinguishing characteristics of the proposed method and the immune-inspired algorithms in general over other alternative solutions. First, the parallel evolvement of several subpopulations (each antibody is associated with a population of mutated clones) in contrast with MOEA where a single population evolved. Second, the population size is changing dynamically, hence the number of potential solutions can be adopted based on the underlying problem characteristics. Finally, the diversity maintenance mechanism such as receptor editing enhances the searching capabilities of the algorithm by escaping from local optima and exploring the search space. In order to provide a fair comparison and to assess the quality of obtained biclusters, the performance of the proTable II C OMPARATIVE S TUDY O N OTHER L OCAL S EARCH BASED A LGORITHMS Algorithm Bic-CSA MOEA [13] FLOC [8] CC [5]

Avg. Residue

Avg. Bicluster Size

297.45 237.34 187.54 204.29

16373.30 10250.87 1825.78 1576.98

posed algorithm has been compared with related immuneinspired algorithms: Bic-aiNet [16], MOM-aiNet [18], and MOIB [19]. The results of these three algorithms come from [19], and have been compared to the performance of Bic-CSA. As shown in Table III, the biclusters founded by Bic-CSA are significantly larger than those obtained by BicaiNet, MOM-aiNet, and MOIB. Besides, the average squared residue of the biclusters founded by Bic-CSA is higher than those extracted by other algorithms while satisfying the homogeneity threshold δ. This is because the size of the generated biclusters by Bic-CSA is significantly high. The main advantage of the proposed algorithm over these three algorithms is that, Bic-CSA incorporates local search procedures (greedy search) into its framework, which significantly accelerate the searching process in terms of generating larger biclusters while maintaining their mean-squared residue under the threshold δ. Consequently, the algorithm converges to global optimum or near global optimum faster than other algorithms. Moreover, Bic-aiNet and MOM-aiNet are using fixed mutation probability for cloned antibodies while Bic-CSA applies dynamic mutation probability that is inversely proportional to the affinity of clones. Hence, local search around potential solutions are performed carefully and efficiently. D. Category Enrichment This section demonstrates the biological relevance of smaller biclusters in terms of statistically significant Gene Ontology (GO) annotation database [25]. The probability of observing the number of genes from a particular Gene Ontology category (e.g. process, component, function) is used to measure the degree of enrichment (i.e. p-value) of obtained biclusters:    f g−f k−1 X i n−i   (15) p=1− g i=0 n where f is the total number of genes within a category and g is the total number of genes within the genome. p-value is used to determine whether obtained biclusters are biological meaningful by measuring the statistical significance using a cumulative hypergeometric distribution. Three biclusters A, B, and C are obtained by setting δ = 20. Bicluster A

contains 36 genes and 7 samples, bicluster B is 27 x 9, and bicluster C is 65 x 7. Table IV shows the significant shared GO terms used to describe the set of genes in biclusters A, B, and C for the process, component, and function ontology terms. The values within parentheses after each GO term are in the form (γ, λ) where γ is the genes number belonging to both the GO term and the bicluster and λ is the associated p-value. A smaller p-value, close to zero, is indicative of a better match which is an indication of the capablility of finding biologically significant biclusters by the proposed algorithm. IV. C ONCLUSION This paper has introduced a clonal selection algorithm combined with local search procedures for biclustering (BicCSA) of gene expression data. Clonal selection algorithm is inspired from natural immune system which takes into account the main immune aspects: selection and cloning of the most stimulated cells, death of non-stimulated cells, affinity maturation and reselection of the clones with higher affinity, generation and maintenance of diversity, and hypermutation of clones proportional to their affinity. Moreover, the incorporation of local search procedures has significantly increased the convergence rate of the algorithm by guiding the search process towards more potential solutions. A comparative study was carried out with other related local search based methods and immune-inspired algorithms. The results have shown that the proposed algorithm outperforms other methods in terms of the size of obtained biclusters while maintaining their mean-squared residue under the threshold δ. Moreover, the obtained biclusters have small pvalue which indicates that these biclusters are biologically Table IV S IGNIFICANT S HARED G ENE O NTOLOGY (GO) T ERMS O F T HE S ELECTED B ICLUSTERS A, B, A ND C. GO Category - Process Bicluster A

Bicluster B Bicluster C

Establishment of Localization (13, 0.0017), Cell Organization & Biogenesis (21, 0.0004), Transport (13, 0.00148), Establishment of Protein Localization (9, 0.000187) Protein Localization (8, 0.00012), Secretion (4, 0.00076), Cellular Localization (9, 0.0001) Cellular Macromodule Metabolic Process (21, 0.0075), Macromodule Complex Assembly (10, 0.0017) GO Category - Component

Table III C OMPARATIVE S TUDY O N OTHER A LGORITHMS BASED ON A RTIFICIAL I MMUNE S YSTEM Algorithm Bic-CSA MOIB [19] MOM-aiNet [18] Bic-aiNet [16]

Avg. Residue

Avg. Bicluster Size

297.445 ± 1.26 202.32 178.28 ± 5.24 194.65 ± 9.25

16373.3 ± 193.56 2638.74 1831.80 ± 114.54 2556.60 ± 188.92

Bicluster A Bicluster B Bicluster C

Membrane (16, 0.0073), Membrane Part (13, 0.0184) Oragnelle Part (15, 0.0094) Non-membrane-bound Organelle (26, 8.812e−6 ), Ribonucleo Protein Complex (14, 0.0004) GO Category - Function

Bicluster A Bicluster B Bicluster C

Heat Shock Protein Binding (2, 0.0077), Hydrolase Activity (6, 0.01) Protein Transporter Activity (2, 0.014), RNA Binding (5, 0.019) Structural Constituent of Ribosome (9, 0.0014)

significant. Consequently, the proposed algorithm can be considered as a powerful tool for discovering genetic regulatory in a noisy environment. This is because of its ability to select genes and samples with more coherent measurement and drops those representing random noise. Finally, a major advantage of Bic-CSA is that it permits the inclusion of genes and samples in multiple clusters. Consequently, it allows the identification of genes by more than one function category. R EFERENCES [1] P. Baldi and G. W. Hatfield, DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling. Cambridge University Press, 2002. [2] A. Ben-Dor, R. Shamir, and Z. Yakhini, “Clustering gene expression patterns,” J. Comput. Biol., vol. 6, no. 3-4, pp. 281–297, 1999. [3] M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein, “Cluster analysis and display of genome-wide expression patterns,” in Proc. Natl. Acad. Sci. USA, vol. 95, no. 25. National Acad Sciences, 1998, pp. 14 863–14 868. [4] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander, and T. R. Golub, “Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation,” in Proc. Natl. Acad. Sci. USA, vol. 96, no. 6. National Acad Sciences, Mar. 1999, pp. 2907–2912.

[12] F. Divina and J. S. Aguilar-Ruiz, “Biclustering of expression data with evolutionary computation,” IEEE Trans. Knowl. Data En., vol. 18, no. 5, pp. 590–602, 2006. [13] H. Banka and S. Mitra, “Evolutionary biclustering of gene expressions,” Ubiquity, vol. 7, no. 42, pp. 1–12, 2006. [14] L. N. de Castro, J. Timmis, H. Knidel, and F. Zuben, “Artificial Immune Systems: Structure, function, diversity and an application to biclustering,” Nat. Comput., vol. 9, no. 3, pp. 575–577, 2010. [15] P. A. de Castro, F. O. de Franc¸a, H. M. Ferreira, and F. J. Von Zuben, “Applying biclustering to text mining: An immune-inspired approach,” in Proc. 6th International Conference on Artificial Immune Systems (ICARIS’07), Santos, Brazil, Aug. 2007, pp. 83–94. [16] ——, “Applying biclustering to perform collaborative filtering,” in Proc. 7th International Conference on Intelligent Systems Design and Applications (ISDA), Rio de Janeiro, Brazil, Oct. 2007, pp. 421–426. [17] P. A. de Castro, F. O. de Franc¸a, H. M. Ferreira, G. P. Coelho, and F. J. Von Zuben, “Query expansion using an immuneinspired biclustering algorithm,” Nat. Comput., vol. 9, pp. 579–602, 2010. [18] G. P. Coelho, F. O. Franc¸a, and F. J. Zuben, “A multiobjective multipopulation approach for biclustering,” in Proc. 7th International Conference on Artificial Immune Systems (ICARIS’8), Phuket, Tailand, Aug. 2008, pp. 71–82.

[5] Y. Cheng and G. M. Church, “Biclustering of expression data,” in Proc. 8th International Conference on Intelligent System on Molecular Biology, San Diego, California, USA, Aug. 2000, pp. 93–103.

[19] J. Liu, Z. Li, and Y. Chen, “Microarray data biclustering with multi-objective immune algorithm,” in Proc. 5th International Conference on Natural Computation (ICNC’09), Tianjin, China, Mar. 2009, pp. 200–204.

[6] G. Getz, E. Levine, and E. Domany, “Coupled two-way clustering analysis of gene microarray data,” in Proc. Natl. Acad. Sci. USA, vol. 97, no. 22. National Acad Sciences, 2000, pp. 12 079–12 084.

[20] L. N. de Castro and F. J. Von Zuben, “Learning and optimization using the clonal selection principle,” IEEE Trans. Evol. Comput., vol. 6, no. 3, pp. 239–251, 2002.

[7] J. Y. Wang and P. Yu, “δ-clusters: Capturing subspace correlation in a large dataset,” in Proc. 18th International Conference on Data Engineering, San Jose, California, USA, Feb. 2002, pp. 517–528. [8] J. Yang, H. Wang, W. Wang, and P. Yu, “Enhanced biclustering on expression data,” in Proc. 3rd IEEE Symposium on BioInformatics and BioEngineering (BIBE ’03), Washington, DC, USA, Mar. 2003, pp. 321–327. [9] A. Tanay, R. Sharan, and R. Shamir, “Discovering statistically significant biclusters in gene expression data,” Bioinformatics, vol. 18, no. 90001, pp. S136–S144, 2002. [10] J. Liu and W. Wang, “Op-cluster: Clustering by tendency in high dimensional space,” in Proc. 3rd IEEE International Conference on Data Mining, Los Alamitos, CA, USA, Nov. 2003, pp. 187–194. [11] S. Bleuler, A. Prelic, and E. Zitzler, “An EA framework for biclustering of gene expression data,” in Proc. CEC’04, Portland, Oregon, USA, June 2004, pp. 166–173.

[21] C. Berek and M. Ziegner, “The maturation of the immune response,” Immunol. Today, vol. 14, no. 8, pp. 400–404, 1993. [22] A. J. T. George and D. Gray, “Receptor editing during affinity maturation,” Immunol. Today, vol. 20, no. 4, pp. 196–196, 1999. [23] M. C. Nussenzweig, “Immune receptor editing: Revise and select,” Cell, vol. 95, no. 7, pp. 875–878, 1998. [24] R. Cho, M. Campbell, E. Winzeler, L. Steinmetz, A. Conway, L. Wodicka, T. Wolfsberg, A. Gabrielian, D. Landsman, D. Lockhart et al., “A genome-wide transcriptional analysis of the mitotic cell cycle,” Mol. Cell, vol. 2, no. 1, pp. 65–73, 1998. [25] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, and J. M. Cherry, “Gene Ontology: Tool for the unification of biology,” Nat. Genet., vol. 25, no. 1, pp. 25–29, 2000.