1446
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998
Spatial Information Retrieval from Remote-Sensing Images—Part II: Gibbs–Markov Random Fields Michael Schr¨oder, Student Member, IEEE, Hubert Rehrauer, Klaus Seidel, and Mihai Datcu
Abstract— We present Gibbs–Markov random field (GMRF) models as a powerful and robust descriptor of spatial information in typical remote-sensing image data. This class of stochastic image models provides an intuitive description of the image data using parameters of an energy function. For the selection among several nested models and the fit of the model, we proceed in two steps of Bayesian inference. This procedure yields the most plausible model and its most likely parameters, which together describe the image content in an optimal way. Its additional application at multiple scales of the image enables us to capture all structures being present in complex remote-sensing images. The calculation of the evidences of various models applied to the resulting quasicontinuous image pyramid automatically detects such structures. We present examples for both synthetic aperture radar (SAR) and optical data. Index Terms—Bayesian statistics, Gibbs random fields (GRF’s), model selection, multiresolution analysis, spatial information, synthetic aperture radar (SAR), texture.
I. INTRODUCTION
R
EMOTE-sensing image data typically contain an enormous amount of information. How can this information be extracted? From the fields of remote sensing and computer vision, there are numerous answers to this question [1]–[4]. Preprocessing, e.g., atmospheric correction, and subsequent thematic classification based on multispectral image features is one possibility. Applying radiometric models, principle component analysis, or nonlinear modifications of the image histogram constitute other approaches based solely on onepixel properties of the image. From the field of computer vision, there are pattern recognition and template matching algorithms as well as edge detection and enhancement procedures, which either assist or determine the extraction of spatial information associated to the content of the image. However, for the purpose of overall image characterization as it is desired for content-based query and retrieval in remote-sensing image archives [5], such features show severe limitations. Subsequently, Part I [6] presented a model-based approach to provide the necessary robust way of describing spatial information in remote-sensing image data. This apManuscript received December 4, 1997; revised May 9, 1998. This work was supported by the ETH project “Advanced Query and Retrieval Techniques for Remote Sensing Image Databases.” M. Schr¨oder, H. Rehrauer, and K. Seidel are with the Communication Technology Laboratory, Swiss Federal Institute of Technology, CH-8092 Z¨urich, Switzerland (e-mail:
[email protected]). M. Datcu is with the German Remote Sensing Data Center (DFD), German Aerospace Center (DLR), D-82234 Oberpfaffenhofen, Germany (email:
[email protected]). Publisher Item Identifier S 0196-2892(98)06833-8.
proach maps the problem of information extraction to the problem of parameter estimation and is able to incorporate prior information, such as beliefs and propositions, within the Bayesian paradigm. We shortly summarize the main ideas of our concept of spatial information extraction in the following section. The aim of this paper is to demonstrate these ideas in practice with the application of a nested family of Gibbs–Markov random fields (GMRF’s) as a powerful and robust mean for spatial information extraction. For the purpose of clarity, we solely concentrate on the simple and direct application of one particular Gibbs model and abstain from combining it with other Gibbs models or classical image features. We first extract information at one fixed scale. Due to the high complexity of remote-sensing images, this analysis at one scale alone is not able to capture everything contained in the image. To succeed in this, we apply several Gibbs models at multiple scales of the image and thus capture all possibly present structures in remote-sensing images, such as agricultural fields, valleys, and geological structures. From the original image, we produce a pyramid of images at quasicontinuous scale. It is in using the evidences of the different models that enables us to automatically detect the various scales at which a certain region of the image exhibits interesting spatial information. The article is organized as follows. After the sketch of the information theoretical aspects of spatial information extraction in Section II, we explain how we characterize the image using stochastic modeling with GMRF’s in Section III. In particular, we present the autobinomial model and the methods of parameter estimation. In Section IV, we directly apply this model to two types of optical data: Landsat Thematic Mapper (TM) and RESURS-01. In Section V, we discuss the concepts of resolution and scale and use them for the unsupervised detection of those scales of the image containing interesting structures. We illustrate the concept of evidence at multiple scales with an X-SAR (synthetic aperture radar) example. We conclude in Section VI with a summary of our main results. To focus on the central points, we have placed all lengthy calculations to the Appendix. II. INFORMATION THEORETICAL PERSPECTIVES: THE BAYESIAN APPROACH From the perspective of information theory, we can formulate the information retrieval task either as a decision problem or as a problem of parameter estimation. In the classical approach of decision theory, the outcome of a measurement
0196–2892/98$10.00 1998 IEEE
¨ SCHRODER et al.: SPATIAL INFORMATION RETRIEVAL FROM REMOTE-SENSING IMAGES—PART II
1447
is understood as a random variable and the probability of an event is defined as its frequency of occurrence in a high number of realizations. As opposed to this frequentist approach, the Bayesian approach of probability does not make use of the classical concept of random experiments and instead interprets probabilities based on a system of axioms describing the incomplete information rather than randomness. As explained in Part I, we extract spatial information in the frame of the Bayesian concept by parameter estimation of a family of image models. We decompose the procedure of information extraction into two steps of Bayesian inference: first the model selection and then the model fit. A. Model Selection In the first level of inference, the model selection, we find out of a set of models , the most plausible model . We find this model by comparing given the image data , which we obtain by using Bayes’ the probabilities rule (1) as probability of the data, given with the evidence the model , and the prior probabilities of the model and the and , respectively. Since, in general, the data is available as forward model of the model by marginalization data, we obtain the evidence
Fig. 1. Probability p(fxs gj; Hi ) of the data fxs g in dependence on the parameter compared with the prior probability p( jHi ) of . The ratio of the widths of these two functions is the Occam factor and acts as penalty term of the model Hi .
The extension of this approximation to a higher dimensional parameter vector is straightforward. is very localWe assume that the probability ized in the vicinity of its maximum . as depicted in Fig. 1. With the assumption of a uniform prior if
and a Gaussian shape of
(2) where we denote the volume element of the parameter space and the prior of the parameters with . with
(4)
otherwise
(5) with variance
, we obtain an approximate evidence (6)
B. Model Fit In the second level of Bayesian inference, the model fit, we apply Bayes’ rule to the parameters of the chosen model and obtain the probability distribution function (pdf) of and the chosen the model parameter given the data model (3) and the likelihood of with the prior the data, given the parameters and the model. The probability is the evidence of the model . With this second level of Bayesian inference, we obtain an estimation of the parameters . This estimation constitutes the most probable parameters of the most probable model and therefore provides the best description of the actual data among the family of models. The marginalization (2) can be performed analytically only for very few models. One example is the general linear model [7] that we use in one of the later sections for parameter estimation. However, if the analytical marginalization is not possible, we can apply an approximation [8], which we illustrate with the following one-dimensional (1-D) example.
The last factor of this equation is called the Occam factor, more and more as which acts to penalize the model the variance of decreases with respect to the width of its prior. It is the Occam factor that prevents us from selecting a model that is too complex to describe the given data. This interpretation is in the spirit of the Occam razor discussed in Part I. III. IMAGE CONTENT CHARACTERIZATION In this section, we first define GMRF’s and then extend on parameter estimation and the calculation of the evidence, which is needed for model selection. In particular, we discuss the maximum pseudo-likelihood (MPL) and the conditional least-squares (CLS) estimator. Then we present the autobinomial model, the interpretation of its parameters, and an example of model selection. A. Gibbs Random Fields (GRF’s) We assume the images to be defined on a rectangular grid of all pixels, that is, the image, of pixel sites . The set is assumed to be a realization of a random field. If the pdf of of this random field is determined only by values the pixel
1448
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998
Fig. 2. Neighborhood pixels @xs of the central pixel xs used for the energy function H (xs ; @xs ). Via the energy function in (7), the neighboring pixels determine the pdf of xs . The parameter vector in the energy function weights the contributions of the different cliques. Later, we use as characterization of the image content.
of pixels in the local neighborhood , this field is called Markovian. In Fig. 2, we depict a particular neighborhood . The index labels the two-pixel cliques consisting of two facing neighbors and . of Such a Markov field can be described as a Gibbs field with , with being a vector a local energy function of scalar parameters reflecting the influence of the different cliques . For a single pixel site , this approach results in the conditioned probability distribution (7) , we label the set of pixels in the neighborhood With , which determines the pdf of of the central pixel via the energy function . The variable denotes the partition sum, that is, the normalization constant, of the in the energy function distribution. A parameter vector weights the contributions of the different cliques. Later we use as characterization of the image content. Gibbs distributions have a huge scope of applications, ranging from statistical physics [9] via lattice systems [10] up to image reconstruction [11], and segmentation [12]. In particular, detailed degradation models can be incorporated, such as the SAR speckle [13]. In the same way as in the physical systems mentioned above, we assume for our image interpretation an interaction of each pixel with its neighbors. We use the neighborhood depicted in Fig. 2 and point out that a kernel of higher complexity is not necessary since textures at coarser scale are captured through the multiscale approach presented in Section V. Model selection tells us which complexity of the neighborhood is needed to best describe a given spatial structure. B. Parameter Estimation The estimation of the parameters of the energy function is the key step in extracting spatial information using Gibbs models. Therefore, it is crucial to apply a reliable and robust estimator. There exist several estimators for Gibbs models [14]: maximum likelihood (ML), coding [10], MPL [10], minimum-chi-square [15], and CLS [16].
For the purpose of fast spatial information extraction, only MPL and CLS are feasible. The latter is especially suited for the autobinomial model due to the special properties of this model. In the following, we shortly review these estimators. To ensure consistency of the estimations of the structural information, we linearly scale the intensities of the whole image to an average mean and variance. These values have to be suited to the applied model. This procedure of consistent rescaling of the test windows before parameter estimation perfectly fits the scheme of the family of nested models sketched in Part I. The one-pixel statistics are used in a first step of information extraction, e.g., using multichannel or radiometric models, whereas in a second step, two-pixel characteristics, such as the correlation function or the parameters of the Gibbs energy function, are used to extract spatial information. In this paper, we demonstrate the usefulness of the Gibbs models without taking into account the one-pixel statistics. 1) MPL Estimator: The only feasible way of calculating the overall probability of the image is the pseudo-likelihood approximation. This assumes that the total probability of the image is the product of the probabilities of all individual pixel sites given the current values of their neighbors (8) With this approximation, the MPL estimation of the parameters is obtained by maximizing the pseudo-probability (8), or—numerically more convenient—by maximizing its logarithm (9) This estimator is known to be consistent [14], that is, in the limit of large images it provides the correct estimation . Due to the complexity of the pdf, there can be various local maxima. We use simulated annealing [17] to find the global maximum in the high-dimensional parameter space. After obtaining an estimate of the parameters, we can calculate the evidence of the model as described in Section II. 2) CLS Estimator: The CLS estimator is defined as (10) with the shorthand notation of the expectation value (11) The quadratic penalty function of (10) is equivalent to the around assumption of a localized Gaussian variation of . This condition is fulfilled very well in the case of the autobinomial model. Hence, for this model, the estimator is well suited. It turns out that the problems of parameter estimation and model selection reduce to the problem of robust linear regression. We demonstrate this for the autobinomial model in the Appendix, Section B, again using the Bayesian paradigm. As stated in the Appendix, the evidence can be calculated analytically and is thus readily available after the
¨ SCHRODER et al.: SPATIAL INFORMATION RETRIEVAL FROM REMOTE-SENSING IMAGES—PART II
Fig. 3. Pdf’s p (xs j@xs ) of the pixel xs in the autobinomial model as they result from the energy function (12) for various values of . This scalar quantity, given by (13), represents the resulting influence of all neighbors @xs , each of which is weighted by one component of . Altogether, the neighbors determine the location of the distribution that is very localized and close to a Gaussian.
1449
Fig. 4. Two example realizations of GRF’s using the energy function of the autobinomial model (12). The visual perception is fundamentally different due to the different parameter vectors = (0; 1; 1; 1; 1) (left) and = ( 2; 1; 1; 0:7; 0:7) (right). In both cases, the gray values range from zero to G = 255. We show the result of the parameter estimation with two different estimators in Table I and perform the model selection of the right example in Fig. 5.
0
0
TABLE I PARAMETERS OF THE EXAMPLES IN FIG. 4
0
AND THEIR
0
ESTIMATIONS
very fast estimation process. This combination of this estimator with the autobinomial model of the next section is particularly suited for the application with large image data sets as they are typical for remote-sensing image databases. C. Autobinomial Model In the following, we discuss the “autobinomial model,” originally presented in [18] and already proposed for the use of image content characterization in remote-sensing images in [19]. However, due to the high number of gray values in typical remote-sensing images, we propose a different scaling of the parameters. Additionally, we remove a redundancy in the parameters that enhances the robustness of the parameter estimation and the calculation of the evidence. In this model, the energy function is defined as (12) and the where we denote the maximum gray value with . The neighboring pixels have binomial coefficients with an effective influence via the scalar quantity (13)
We depict the labels of the neighboring pixels in Fig. 2. Altogether, the parameter vector (14) determines the statistical properties of the random variable , that is, the appearance of an image that is a typical realization of such a random field. In Fig. 3, we show the pdf for particular values of , and in Fig. 4, we show two example realizations of the autobinomial model that have been generated using the sampling algorithm.
D. Interpretation of the Parameters and Model Selection The interpretation of the parameters of the autobinomial model reflects the huge scope of its applications and of Gibbs weights the correlation models in general. The parameter and the horizontal neighbors between the central pixel and , whereas weights the interaction with the and . The interpretation of the other vertical neighbors is analogous. The parameter represents a selfparameters interaction. These interactions result in characteristic image structures that have a clear fingerprint in the parameter vector , such as the examples in Fig. 4. It is possible to extract that fingerprint from a given image by parameter estimation and use the result as characterization of the content of this image. Table I shows the results of the parameter estimation of the examples in Fig. 4 for both estimators, MPL and CLS. As shown in the Appendix, Section C, the autobinomial model can be approximately mapped to an autoregressive process (15) as a Gaussian noise term with mean zero and with . Note that the approximation only holds under variance specific conditions, as explained in the Appendix, and the autobinomial model represents a much richer class of textures.
1450
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998
Fig. 5. Model selection for the example of Fig. 4 (right). On the abscissa, we specify the activated cliques, that is, bij , which we use for parameter estimation and calculation of the evidence. With an increasing number of parameters, the evidence also increases. However, at a certain level of complexity, the penalization incurred due to the Occam factor outweighs the benefits from a tighter fit of the data. Subsequently, the evidence begins to decrease. The point of maximum evidence coincides with the actual complexity of the example data in Fig. 4.
Since is defined as a weighted sum of all neighbors (13), each of which can be assumed to fluctuate independently with the same amplitude, we can specify the norm of
First, we rescale the image by a factor of two to better match the interesting textures with our model. At full resolution, the desired information may not always be represented by our model in a clear way. Instead, it may be more suitable to extract that information at a different scale. This optimum scale can be detected without supervised interaction using the concept of “evidence in scale,” which we present in Section V-A. After estimating the parameters and calculating the evidences for the estimations with first- and second-order autobinomial models, we select those areas where the more complex model exhibits a significantly higher evidence. These are shown as yellow in Fig. 6 (middle). The estimations of these areas with higher evidence of the more complicated model can now be tested for certain spatial criteria. In particular, we test for - and -oriented texture using the (red) and (green), respectively. thresholds We show the results in Fig. 6 (right). This extraction of spatial information need not be constrained to this simple thresholding of individual parameters. Instead, we can use the complete as feature vector in the same way as, parameter vector e.g., multichannel radiometric information. We demonstrate this with the next example.
(16) B. RESURS-01 Example as a weight of the resulting provides a measure of the , relative contribution of of the neighborhood clique
fluctuation of the pixel . This overall texture complexity. The that is, the relative contribution , is given by (17)
To do the parameter estimation in a robust way, it has to be accomplished with a model selection to obtain those neighborhood pixels that are really necessary to describe the texture and dismiss the neighbors that only cause superfluous complexity of the description. We present such a model selection for one of the examples in Fig. 5. The result of the model selection constitutes another piece of knowledge on the image content. IV. APPLICATION TO REAL DATA
AT
ONE SCALE
In the following, we present two examples demonstrating the application of the autobinomial model to optical data: in Section IV-A to a Landsat TM scene and in Section IV-B to a scaled RESURS-01 scene. These are two examples for information retrieval at one selected scale. In Section V, we show how we analyze information at multiple scales using the evidence. A. Landsat TM Example The image under investigation is a cut of a Landsat TM scene shown in Fig. 6 (left). The clearly visible structures are agricultural fields with different orientations. To extract the information about the orientations of these fields, we proceed in three steps: rescaling, model selection, and information extraction.
In the following, we apply a standard supervised classification technique that is commonly used for image segmentation of multispectral data [4]. Our example image is one channel of a RESURS-01 image shown in Fig. 7. We have reduced this image from the original scale by a factor of 0.1 to a much coarser scale (one pixel is approximately 2 2 km). The aim of the following demonstration is to segment different textural regions of the image purely by estimating the parameters of the Gibbs model. Radiometric quantities, such as mean and variance, though very powerful, are not used. Instead, we always adjust these quantities in the small estimation windows to default values, as mentioned earlier. To learn the characteristic location and shape of the Gibbs for the parameters of the different classes, we estimate second-order autobinomial model at various locations of the three classes “alps,” “cumulus,” and “cirrus.” Then, we assume that the probability distributions for these classes can be described by multivariate normal distributions. The parameters of these distributions, the mean vector of estimated values (18) and the covariance matrix (19) are estimated from the training estimations. Each class is characterized by its mean vector and its covariance matrix . Due to the redundancy of one parameter in (see the Appendix, Section A), we neglect for this application and use only four components of the parameter vector . We specify the cluster centers of the three classes in the table in Fig. 7. The shape of each cluster is an ellipsoid with a . To visualize the extension of volume proportional to
¨ SCHRODER et al.: SPATIAL INFORMATION RETRIEVAL FROM REMOTE-SENSING IMAGES—PART II
1451
Fig. 6. Cut of a Landsat TM image (left) showing different orientations of agricultural fields. At a coarser scale (scale 0.5), we apply the autobinomial model to find complicated structures, which manifest themselves in a high evidence of the complicated model (yellow, middle). In these areas, we test for - and -oriented texture (right) using the thresholds b21 > 0:5 (green) and b22 > 0:5 (red), respectively. At each pixel position, we have estimated the parameter 8 window with adjusted mean and variance. In both cases, the lake is shown in blue using a simple intensity model. vector in an 8
n
2
model separates regions only according to spatial variation, thereby neglecting one-pixel intensity information. Together with an analysis of mean and variance, that is, a simple intensity model, these two areas of snow and lakes can be perfectly divided into two subclasses. V. INFORMATION EXTRACTION AT MULTIPLE SCALES
Fig. 7. ML classification of the estimated parameters of the autobinomial model. We use a multivariate Gaussian distribution of the parameters to describe each class in the 4-D parameter space bij and mark the three classes “alps,” “cumulus,” and “cirrus” with the colors “green,” “red,” and “blue,” respectively. Classifications below the threshold are not marked. The table (right) provides the clusters of the training estimations and specifies the centers and a measure of its extensions in parameter space. For this example, we have performed the parameter estimations in 32 32 windows.
f g 2
the clusters in parameter space, we also provide , which is proportional to the radius of a sphere with the same volume as the cluster. After obtaining this description, we classify the whole image by estimating the parameters in overlapping windows, calculating the probability of each class in the multivariate model, and selecting the class with ML. Additionally, we apply a threshold to the probability to avoid poor classification at the tails of the distributions. We show the result of this classification in Fig. 7. The classification according to the structural configurations of the regions is apparent. Note that certain areas of dark patches caused by several lakes are assigned to the same class as the bright patches of snow (green class). This finds complete agreement with our concept of structural information as modelbased description of the image signal. In our example, the
Despite the undeniable capabilities of the direct application of Gibbs models, they come with one important drawback: highly complex textures that have features at different scales require a local energy function of the pdf, (7), which has a large number of parameters depending on a vast neighborhood. This makes estimation and interpretation of the parameters very difficult. In the following, we first present the concept of evidence in scale to overcome this problem. Then we show the application of this concept to the extraction of spatial information from an X-SAR image. A. Concept of Evidence in Scale The approach we present in this section to capture all texture features is to scale the image quasicontinuously. Thereby, we obtain an image pyramid with the original image as the base layer and the scaled versions of the image as the higher layers. Each layer is analyzed with a reasonably sized local energy function. So the same model applied to different layers extracts texture information at different scales. The analysis of the corresponding evidence yields the criteria about the significance of that information. We combine the estimations at all significant scales and get the full information about the texture. Parameter estimation and model selection is applied to all scaled images, i.e., to all layers of the pyramid. This is a straightforward extension of the Landsat TM example in Section IV-A, which demonstrates the texture analysis at a fixed scale. pixel image to smaller sampling rates Scaling an , where , reduces resolution and, in contrast to scaling functions defined in continuous space, is always afflicted with loss of information. However, it is only the
1452
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998
Fig. 8. Texture analysis applied to an X-SAR image at various scales. We display the results at scales 0.354, 0.177, and 0.125 (from left to right) and highlight the regions where the second-order model exhibits higher evidence than the first-order model. In these regions, the complex model detects different diagonal structures (color scheme as in Fig. 6, right). With changing scale, the complex model fits in different regions of the image. For a fixed location of the image, the model detects different characteristics at different scales, as we visualize in Fig. 9 for the regions marked with the white frames using quasicontinuous scaling.
information of the very short range texture that is lost. To retrieve the information of the textures at all ranges, with the Gibbs kernel of limited size, we consecutively apply estimations to extract the short-range texture information while downscaling the image to smaller scales. The scaling makes the short-range features that have been evaluated at finer scales vanish and reduces the order of the long-range ones, bringing them into the range of the Gibbs kernel. B. SAR Example In the following, we demonstrate this multiscale approach using SAR image data. These types of images present two major challenges regarding automatic extraction of spatial information. They contain textures at very different scales (agricultural fields, vegetated areas, geological structures), and they are affected by strong noise (multiplicative speckle). The problems we are faced with are that long-range textures with a periodicity much larger than the Gibbs kernel cannot be detected in a robust way. Additionally, parameter estimation in the presence of noise is very difficult. Scaling the image enhances the signal-to-noise ratio (SNR) [20] and brings the correlation length of long-range texture to the size of the Gibbs kernel. The difference to other methods for detecting long-range texture (dilated Gibbs kernels [21] or multiscale random fields [22]–[24]) is that in this approach we work on quasicontinuous scales and make explicit use of the model evidence to select between them. In Fig. 8, we present the analysis of a three-look X-SAR image, which is a standard SAR product. We apply scaling with up to with scale factors by lowpass filtering and a minimum scale factor subsampling. Filtering is performed by cutting the frequency spectrum of the image in the Fourier domain at the Nyquist frequency of the scaled image. For the first- and second-order model shown in Fig. 5, we estimate the parameters and calculate the evidence at each scale . The image contains information in the form of a diagonal texture at a scale in a certain region if the secondorder model has higher evidence than the first-order model.
In that case, the more complex model is necessary to fully describe the diagonal correlations. In Fig. 8, we show these regions marked with red and green at three different scales in the same way as in Fig. 6 (right) for the Landsat image. In Fig. 9, we analyze the change of the relative evidence and the parameters in scale exemplarily for the regions marked with the white frames in Fig. 8. At a very fine scale, there is no diagonal texture detected because of the strong speckle noise. When going to coarser scales, the texture associated with the red color is first detected. This fine texture is smeared through further scaling, and a long-range texture having another orientation becomes dominant. The transition occurs at scale , at which the relative evidence reaches its minimum. It is apparent that the estimated texture parameters of an image region necessarily depend on the scale at which it is viewed. They represent different features of the texture and carry different information. This property is the basis of our multiscale approach. It allows us to describe texture characteristics extending from small to large range with a Gibbs model of limited order. VI. SUMMARY
AND
CONCLUSION
We conclude by emphasizing that the success of the application of GMRF models for the spatial information extraction from remote-sensing images relies on the fulfillment of two prerequisites. First, on the decomposition of the information extraction procedure into two steps of Bayesian inference: 1) model selection and 2) model fit. This provides us with the most plausible description of the image signal within our family of models. Second, the information extraction at multiple scales is necessary to completely capture the complex content of the remote-sensing image. With the setup of a quasicontinuous image pyramid, we can detect the optimum scale for information extraction of certain regions of the image. We have illustrated the successful application of GRF’s in this frame with examples ranging from optical data at fixed scale up to SAR data at quasicontinuous scale. Together, the combination of the information theoretical concept with a
¨ SCHRODER et al.: SPATIAL INFORMATION RETRIEVAL FROM REMOTE-SENSING IMAGES—PART II
1453
Fig. 9. Change of the relative Gibbs parameters ~ b21 and ~ b22 of the second-order autobinomial model (upper plot) and the relative evidence of this model (lower plot) with continuously decreasing scale. The image location investigated is marked with white frames in Fig. 8. At scales s = 1:0–0:7, the relative evidence is high since the predominant speckle pattern is better explained by the complex model. At smaller scales, the speckle is smoothed and the amount of relative evidence is determined by the presence of diagonal textures that cannot be captured by the first-order model. At scale s 0:292 (arrow), a transition between two diagonal textures occurs and the relative evidence reaches its minimum.
strong family of models applied at multiple scales promises to be a powerful tool for content-based query and retrieval from future remote-sensing image databases. APPENDIX SOME PROPERTIES OF THE AUTOBINOMIAL MODEL For the autobinomial model, the partition sum ately follows from the energy function (12)
immedi(20)
Therefore, we can write the pdf (7) of the autobinomial model as (21) with the quantity We find the mean to be
of real data and always renormalize the data to these values before estimation. Therefore, the estimated parameters only contain information regarding the spatial properties and not concerning radiometric or intensity properties. As we show later in this appendix, this procedure additionally removes one parameter, thus making the process of interpretation of the parameter vector easier. With (22) and (23), we state the expected mean and variance for a given , that is, for a certain effective neighborhood. However, it is more interesting to know the mean of the (with the subscript whole image, that is, the quantity , we denote the average over all sites as opposed to the at one particular site ). This question expectation value we can answer by applying the “mean field approximation” to the autobinomial model in the next section. A. Mean Field Theory Applied to the Autobinomial Model
. (22)
In this approximation [25], we assume that each pixel sees average neighbors due to all pixels in the image and, from (13), we obtain
and the variance
(24) (23)
, we obtain and . We For use these two values to ensure consistent parameter estimation
Similarly, from (22), we can approximate (25)
1454
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998
Consequently, we have two equations, one linear and one highly nonlinear, for two unknown variables. Assuming leads via to the constraint
a Gaussian noise vector pixel values
, and the matrix
of neighboring if
(26) This constraint defines a plane in the high-dimensional parameter space with the pdf being located in the vicinity of that plane. For both the MPL estimation and the calculation of the evidence of the model fit, the knowledge about the shape of the pdf is vital to obtain efficient and robust algorithms. Note that (26) makes one parameter of practically redunis fulfilled. However, since dant if the condition this constraint was derived assuming some approximations, we have to keep all parameters for the estimation but can neglect one for the later interpretation and classification.
otherwise.
(34)
and denote the two Here, the variables being described by the neighborhood neighbors of the pixel . The th row specifies all neighbors of the clique th pixel, and the th column contains the th neighbor of all pixels in the image. The first parameter does not weight any neighbor and is therefore weighted by one. General linear models of the type of (32) can be solved elegantly in the framework of Bayesian methods [7]. Essential for the complete analytical treatment is the assumption that the noise term is Gaussian. Then the ML estimate of is given by
B. CLS Estimator of the Autobinomial Model
(35)
of the autobinomial model, the CLS With (22) for estimator (10) is defined as
Additionally, the evidence for this model can be shown to be
(27) Note that the parameter vector enters this equation through (13). The quadratic cost function of (27) the definition of is the same as the Ansatz (28) as Gaussian noise of mean zero and small variance. with As depicted in Fig. 3, this is a very good approximation in the results in case of the autobinomial model. Solving for
(36) being the number of unknown parameters and being with the number of pixels in the image, that is, the number of rows being the best fit of the data . The constants of and and are normalization constants of the priors of an additional hyperparameter and of the assumed noise term . C. Autoregressive Limit of the Autobinomial Model
(29)
The expectation value of expanded as
for a given
(22) can be
We can perform a Taylor expansion and obtain (37)
Since
and
(30)
According to this equation, we can linearize the autobinomial . Similarly, the variance (23) is model for
(31)
(38)
we can approximate
is also a Gaussian noise term of zero mean and where (13), the small variance. Together with the definition of problem of parameter estimation is reduced to the overdefined system of linear equations
Together with the definition of , we obtain
(13) and the variance for
(39)
(32) with the vector of unknown parameters , the vector transformed pixel values
of
(33)
which is the definition of an autoregressive (AR) process. , that is, for slowly This equivalence is correct for . However, varying pixel values and small parameters if this condition is not fulfilled, the autobinomial model is fundamentally different from the AR process and represents a much richer class of textures.
¨ SCHRODER et al.: SPATIAL INFORMATION RETRIEVAL FROM REMOTE-SENSING IMAGES—PART II
ACKNOWLEDGMENT The authors thank Prof. H. K¨unsch and M. Walessa for fruitful discussions on Gibbs random fields and J. Oakley for his help in reviewing and commenting on this work. The Landsat TM and the RESURS-01 scene have been provided by NPOC Switzerland, ETH Z¨urich, and the X-SAR scene by DFD/DLR, Oberpfaffenhofen, Germany.
REFERENCES [1] R. Duda and P. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973. [2] R. M. Haralick, “Automatic remote sensor image processing,” in Digital Picture Analysis, A. Rosenfeld, Ed. Berlin, Germany: Springer-Verlag, 1976. [3] R. N. Colwell, Ed., Manual of Remote Sensing, Amer. Soc. Photogramm., 1978, vol. 1. [4] J. A. Richards, Remote Sensing Digital Image Analysis, 2nd ed. Berlin, Germany: Springer-Verlag, 1995. [5] K. Seidel, R. Mastropietro, and M. Datcu, “New architectures for remote sensing image archives,” in Proc. IEEE IGARSS’97, T. I. Stein, Ed. vol. 1, pp. 616–618. [6] M. Datcu, K. Seidel, and M. Walessa, “Spatial information retrieval from remote sensing images—Part I: Information theoretical perspective,” this issue, pp. 1431–1445. [7] J. Ruanaidh and W. Fitzgerald, Numerical Bayesian Methods Applied to Signal Processing. Berlin, Germany: Springer-Verlag, 1996. [8] D. S. Sivia, W. I. F. David, K. S. Knight, and S. F. Gull, “An introduction to Bayesian model selection,” Physica D, vol. 66, pp. 234–242, 1993. [9] L. D. Landau and E. M. Lifschitz, Lehrbuch der Theoretischen Physik, V. Statistische Physik. Berlin, Germany: Akademie-Verlag, 1966. [10] J. Besag, “Spatial interaction and the statistical analysis of lattice systems,” J. R. Stat. Soc., vol. 36, no. 1, pp. 192–236, 1974. [11] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-6, pp. 721–741, Nov. 1984. [12] S. Lakshmanan and H. Derin, “Simultaneous parameter estimation and segmentation of Gibbs random fields using simulated annealing,” IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 799–813, Nov. 1989. [13] M. Walessa and M. Datcu, “Bayesian approach to SAR image reconstruction,” in Proc. IEEE IGARSS’97, T. I. Stein, Ed. vol. 2, pp. 767–769. [14] X. Guyon and H. R. K¨unsch, “Asymptotic comparison of estimators in the Ising model,” in Stochastic Models, Statistical Methods, and Algorithms in Image Analysis, P. Barone, A. Frigessi, and M. Piccioni, Eds. Berlin, Germany: Springer-Verlag, 1996, vol. 74 of Lect. Notes Stat., pp. 177–198. [15] E. Gl¨otzl and B. Rauchenschwandtner, “On the statistics of Gibbsian processes,” in 1st Pannomian on Math. Stat., P. Revesz, L. Schmetterer, and V. M. Zolotarev, Eds. Berlin, Germany: Springer, 1981, vol. 8 of Lect. Notes Stat.. [16] S. R. Lele and J. K. Ord, “Conditional least squares estimation for spatial processes: Some asymptotics results,” Dept. Stat., Pennsylvania State Univ., University Park, Tech. Rep. 65, 1986. [17] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C. Cambridge, U.K.: Cambridge Univ. Press, 1992.
1455
[18] G. R. Cross and A. K. Jain, “Markov random field texture models,” IEEE Trans. Pattern Anal. Machine Intell., vol. 5, pp. 25–39, Jan. 1983. [19] M. Schr¨oder, K. Seidel, and M. Datcu, “Gibbs random field models for image content characterization,” in Proc. IEEE IGARSS’97, T. I. Stein, Ed. vol. 1, pp. 258–260. [20] J. W. Goodman, “Statistical properties of laser speckle patterns,” in Laser Speckle and Related Phenomena, J. C. Dainty, Ed. Berlin, Germany: Springer, 1975. [21] J. Mao and A. K. Jain, “Texture classification and segmentation using multiresolution simultaneous autoregressive models,” Pattern Recognit., vol. 25, no. 2, pp. 173–188, 1992. [22] M. R. Luettgen, W. C. Karl, A. S. Willsky, and R. R. Tenney, “Multiscale representations of Markov random fields,” IEEE Trans. Signal Processing, vol. 41, pp. 3377–3396, Dec. 1993. [23] C. A. Bouman and M. Shapiro, “A multiscale random field model for Bayesian image segmentation,” IEEE Trans. Image Processing, vol. 3, pp. 162–177, Feb. 1994. [24] H. Rehrauer, K. Seidel, and M. Datcu, “Multiscale Markov random fields for large image datasets representation,” in Proc. IEEE IGARSS’97, T. I. Stein, Ed. vol. 1, pp. 255–257. [25] K. Huang, Statistical Mechanics, 2nd ed. New York: Wiley, 1987.
Michael Schr¨oder (S’97) received a physics degree from the University of Ulm, Germany, in 1996 and studied physics at the University of Leeds, U.K. He was a member of the Quantum Optics Groups of the University of Ulm and Texas A&M University, College Station, from 1995 to 1996. Since autumn 1996, he has been with the Image Science Group, Institute of Communication Techniques, ETH Z¨urich, Switzerland. His current research covers all aspects of remote-sensing image databases, in particular, Bayesian image analysis, Gibbs random fields, and user-adaptive clustering.
Hubert Rehrauer received the degree in physics from the University of Wuerzburg, Germany, in 1996 and studied physics at the University of Grenoble, France. He is currently pursuing the Ph.D. degree from the Image Science Group, Institute of Communication Techniques, ETH Z¨urich, Switzerland. He was with Daimler Benz, Munich, Germany, from 1995 to 1996, where he worked on physically based algorithms for information extraction in the Microsystems Research Group. His special interest lies in algorithms for digital signal processing and information extraction.
Klaus Seidel, for a photograph and biography, see this issue, p. 1445.
Mihai Datcu, for a photograph and biography, see this issue, p. 1445.