A Markov Random Field Approach to Microarray ... - Semantic Scholar

Report 5 Downloads 151 Views
A Markov Random Field Approach to Microarray Image Gridding Giuliano Antoniol, Michele Ceccarelli Research Center on Software Technologies University of Sannio Via Traiano, 82100 Benevento, Italy antoniol,ceccarelli@unisannio.it Abstract The paper reports a novel approach for the problem of automatic gridding in Microarray images. The solution is modeled as a Bayesian Random Field with a Gibbs prior possibly containing first order cliques (1-clique). On the contrary of previously published contributions, this paper does not assume second order cliques, instead it relies on a two step procedure to locate microarray spots. First a set of guide spots are used to interpolate a reference grid. The final grid is then produced by an a-posteriori maximization which takes into account the reference rectangular grid and local deformations. The algorithm is completely automatic and no human intervention is required, the only critical parameter being the range of the radius of the guide spots.

which is the process of assigning the coordinates to the spots, (ii) segmentation, it allows the separation between foreground and background pixels, (iii) intensity extraction, it consists in the computation of average foreground and background intensities for each spot of the array. Most of available gridding approaches require human intervention, for example to specify some points in the spot grid or even to register individual spots. Automating this part of the process will allow high throughput analysis. Therefore, this paper focuses on the development of an automated procedure for the problem of automatic gridding.

1. Introduction

The problem of automatic gridding is complicated by the fact that microarray images are usually highly contaminated with noise and artifacts of the wet lab processes. Often rotations, misalignment and local deformations of the ideally rectangular grid can occur. There is an high need of methods for microarray gridding which are robust and flexible at the same time.

DNA microarrays [3] allow the simultaneous monitoring of the expression levels for thousands of genes. This has a large impact in many application areas such as diagnostic human diseases and treatments (determination of risk factors, monitoring disease stage and treatment progress, etc.), agricultural development (plant biotechnology) or quantification of GMOs, drug discovery and design. In cDNA microarrays, a set of genetic DNA probes (from several hundreds to some thousands) are spotted on a slide. Two populations of mRNA, tagged with fluorescent dyes, are then hybridized with the slide spots, and finally the slide is red with a scanner. The outlined process produces two images, one for each mRNA population, each of which varies in intensity according to the level of hybridization represented as the quantity of fluorescent dye contained in each spot. Image analysis is an essential aspect of microarray experiment: measures over the scanned image can substantially affect successive steps such as clustering and identification of differentially expressed genes. Scanned microarray image processing has three main tasks: (i) gridding,

Some efforts to help automatic microarray data processing have recently emerged in literature. However most of them impose different kind of restrictions and are based on strong assumptions. For example, the approaches in [1, 7] require that grid rows and columns are strictly aligned with the and  axes. Other approaches such as [5] and [6] rely on the Bayesian paradigm to deal with uncertainty and noise. In particular the approach presented in [6] describes a second order prior for microarray gridding whereas [5] presents a general approach to the grid matching and image warping problem. Here we adopt a prior, possibly, containing just 1-cliques, this can simplify, from a computational point of view, the search of the maximum a posteriori solution. The adopted prior requires the previous computation of a reference regular grid which is generated by interpolating a set of guide spots located by using the Orientation Matching Transform (OMT) [2]. Interpolating a set of guide spots has also been exploited in [9], here we obtain the final result via a MAximum a Posteriori (MAP) estimate accounting for local deformation of the regular grid.

0-7695-2128-2/04 $20.00 (C) 2004 IEEE

that the point    be the center of a circular object. The local maxima of    which are above a given threshold are considered as centers of the guide spots. In our experiment we assumed a threshold of 0.8.

2.1. The Reference Grid The detected guide spots computed in the previous step are used to generate the rectangular grid,  , which best interpolates their centers. The problem to determine the optimal rectangular grid  was modeled as a coordinate transformation from a given coordinate system    to a new one      . Guide spots coordinates are expressed into the    axes, and produced with respect to the hardware and software acquiring the microarray image. The new coordinate system       attempts to minimize a large fraction of distortions introduced by different microarray fabrication imprecisions and errors. Ideally, in the new coordinate system spots are exactly located in posi              ; this tion      also means that any cost function should take into consideration the displacement of the microarray spots from the         ideal grid (i.e., the points            ).

Figure 1. A microarray image with various geometrical and radiometric distortions.

2. The Prior Y

This paper assumes a notation similar to the one adopted in [5]; let us define the list of node locations     , where is the vector of image coordinates of the -th node. Let        be the reference grid computed as reported in the section 2.1, then the joint distribution of is modeled by a Gaussian Random Field:





 

  

   



Y’

X’ ∆y β ∆x α



 

(x0, y 0 )

(1)

Therefore the grid location of the -th point is assumed to be Gaussian distributed with mean  , the computation of a rectangular reference grid with nodes  is the main part of our approach. This is performed in a two step procedure. In the first step guide spots are located via OMT. The second phase defines a grid interpolating the identified spots via metaheuristic search (i.e., stochastic hill-climbing and genetic algorithms). Guide spots are located by using the Orientation Matching Transform (OMT) for circular object detection. OMT [2]. OMT is an extension of the Hough Transform for circles, and has several advantages: it is a correlation-based transform, it allows to treat in the same manner a set of radius in a wide range, it can be tailored to recognize clear spots on dark background and vice versa or both. Given an image    , its OMT,    represents the evidence

X

Figure 2. A 2x2 distorted data geometrical model.

As shown in Fig. 2 in this paper we assume that       distortion was mainly due to a roto-translation of the   axes. The new origin is located in    , and axes underwent independent rotations. In other words, the new coordinate system is identified by a tuple of six parameters:    (the coordinates of the lower left point of the grid),  and  (i.e., the angles of the grid axes with the direction),  and  (i.e., the grid spacing). Given a set of microarray spots with center coordinates,     for any given grid, defined by a tuple           , we would like to summarize with a cost figure how well the spots are interpolated by the grid    . In our investigation three possipoints 

0-7695-2128-2/04 $20.00 (C) 2004 IEEE

ble criteria were considered: the average (averaged over  ) Euclidean distance from the nearest  ; the minimum Euclidean distance (minimum over  ) from the nearest  ; and the maximum (over  ) Euclidean distance from the nearest  . The latter criterion was adopted in our computation since it is the most severe. Furthermore, by minimizing the largest minimal deviation a reduction of the other two figures is obtained. More formally, the optimal coordinate system      is determined by a tuple            corresponding to:





¼ ¼  

 



 

 

    

(2)

where   is the usual Euclidean norm,  are the vertex of the grid and  the center of the spots. Being such criterion non linear and non differentiable, we applied metaheuristic search. Metaheuristic algorithms, such as genetic algorithms and simulated annealing, have been successfully applied to a number of engineering problems ranging from load balancing in the process industries (pressing of sugar pulp), through an electromagnetic system design, to aircraft control and aerodynamics. We applied an algorithm inspired by stochastic hill climbing and genetic algorithm with elitism. The tuple            was represented as a vector of six real number; by means of an heuristic algorithm suitable ranges for the tuple parameters are defined. Based on these ranges, an initial population is randomly generated; at each generation 40 % of the current population is replaced. The crossover operator randomly selects a crossover point, then it creates two new individuals exchanging the genetic material of the older chromosomes. The mutation operator attempts to improve the given solution mimicking a stochastic hill climbing. It aims to improve a solution by randomly changing one of the six parameter (randomly selected). If no better solution is generated after 1000 attempts a new individual is created. At each generation the top two solutions are retained.

or equivalently the OMT transform should reach its maximum value at each grid vertex. Therefore, we adopt as observation model:

  





      

(3)

Notice that once the OMT transform have been computed at the beginning of the analysis, the computation of (3) requires its values at the vertex points, without recomputing it. By combining (1) and (3) we obtain the Maximum A Posteriori (MAP) grid estimate that maximizes the logposterior:



 

                    

(4)

where  and  are two parameters balancing the weight of the prior over the posterior. The ratio  reflects the tradeoff between the regularity of the reference grid and the local deformation imposed by maximizing the OMT over the grid vertexes. Once again, to optimize the solution we adopted a metaheuristic approach in particular we apply the simulated annealing scheme to maximize (5). In the maximization phase we restrict the grid node positions to correspond to pixel positions and the move, as in [5] among the 4-neighbors.

3. Maximum A Posteriori Grid Estimate To apply the Bayes’ principle, we must consider an observation model for the grid matching problem, in particular, having selected a grid configuration, the observation model should weight the difference between the observed image and the ideal image corresponding to a specific grid configuration [8]. In other words, the likelihood term in the Bayes principle encompasses our knowledge about the ideal image [4]. In our case the likelihood     describes the probability of observing given . Of course, the ideal image should have a spot center at each vertex of the grid ,



Figure 3. The MAP grid.

0-7695-2128-2/04 $20.00 (C) 2004 IEEE

 0 3 6 9 12

MSE 0.74515 3.07025 3.9925 4.13515 6.2344

 1 4 7 10 13

MSE 2.1692 3.4471 4.90835 4.6315 7.7469

 2 5 8 11

MSE 2.8148 3.68055 4.2673 5.1643

Table 2. The mean square error attained on a set of 1400 corrupted images as function of the noise variance.

Figure 4. The MAP grid for a randomly generated spot image affected by different kinds of distrortion.

Solution Interpolated grid ( )

  

  

MSE 8.36 5.49 5.06 4.63

Solution

   

   

MSE 4.43 4.23 4.09 4.17

Table 1. The mean square errors attained by various solutions obtained with our method.

4. Results The outlined process was applied to several images with encouraging results, in this section we report results obtained on a typical noisy sample image. The image of figure 1 consists of a 24x16 grid with several geometrical and radiometric distortions such as global rotation an local warping, varying spot radii and various levels of background. Local maximum above a given threshold produce the guide spots exploited in the interpolating process. The metaheuristic algorithm produces a regular grid which is then locally refined in order to give the final MAP estimate shown in figure 3. As a measure of the performance, we report in table 1 the mean square error (MSE), in pixels, between the solution produced by this method with various values of the ratio  , and the solution produced by the software GridGrinder which has the image of figure 1 as a reference test. The table 1 also shows how the MAP estimate improves the reference grid obtained by just interpo-

lating the guide spots. In order to further evaluate in a quantitative manner the performance of the algorithm, we generated a set of simulated 1400 images containing random distortions of a regular grid. In particular, the spot locations, axis orientation, and spot radius were randomly generated according to a gaussian probability density function having the original grid as average. As a result the mean squared error between the spot locations of the detected grid (see figure 4) and the true location as function of the noise (distortion) variance is reported in table 2.

References [1] J. Angulo, J. Serra (2003), ’Automatic Analysis of DNA Microarray Images Using Mathematical Morphology’, Bioinformatics, vol. 19 (5), pp. 553-562. [2] M. Ceccarelli, A. Petrosino (2001), ’The Orientation Matching Approach to Circular Object Detection’, in Proceedings of IEEE International Conference on Image Processing 2001, pp. 712-715, IEEE Press [3] M. B. Eisen, P. O. Brouwn (1999), ’DNA arrays for analysis of gene expression’, Methods in Enzymology, vol. 303, 1999. [4] G. Geman, D. Geman (1984), ’Stochastic Relaxation, Gibbs Distribution and the Bayesian restoration of Images’, IEEE Transactions on PAMI, vol. 6, pp. 721-742. [5] K. Hartelius, J. M. Cartstensen (2003), ’Bayesian Grid Matching’, IEEE Transactions on PAMI, vol. 25 (2), pp. 162173. [6] M. Katzer, F. Kummert, G. Sagerer (2003), ’A Markov Random Field Model of Microarray Gridding’, Proceed. of SAC 2003, ACM [7] A. N. Jain, T. Tokuyasu, A. Snijders, R. Segraves, D. Albertso, D. Pinkel (2003), ’Fully Automatic Quantification of Microarray Image Data’, Genome Research, vol. 12, pp. 325332. [8] S. Z. Li (2001), Markov Random Field Modeling in Image Analysis, Springer Verlag. 2-nd edition. [9] A. W. Liew, H. Yan, M. Yang (2003), ’Robust Adaptive Spot Segmentation of DNA Microarray Images’, Pattern Recognition, vol. 36, pp. 1251-1254.

0-7695-2128-2/04 $20.00 (C) 2004 IEEE