MRF-based texture segmentation using wavelet decomposed images Hideki Noda ∗†, Mahdad N. Shirazi ‡, Eiji Kawaguchi † †Kyushu Institute of Technology, Dept. of Electrical, Electronic & Computer Engineering, 1-1 Sensui-cho, Tobata-ku, Kitakyushu, 804-8550 Japan ‡Communications Research Laboratory, 588-2 Iwaoka, Nishi-ku, Kobe, 651-2401 Japan
Abstract In recent textured image segmentation, Bayesian approaches capitalizing on computational efficiency of multiresolution representations have received much attention. Most of previous researches have been based on multiresolution stochastic models which use the Gaussian pyramid image decomposition. In this paper, motivated by nonredundant directional selectivity and highly discriminative nature of the wavelet representation, we present an unsupervised textured image segmentation algorithm based on a multiscale stochastic modeling over the wavelet decomposition of image. The model, using doubly stochastic Markov random fields, captures intrascale statistical dependencies over the wavelet decomposed image and intrascale and interscale dependencies over the corresponding multiresolution region image.
keywords: image segmentation, texture, MRF, wavelet, multiresolution, unsupervised
∗ Corresponding
author: Tel: +81-93-884-3247, Fax: +81-93-871-5835, Email:
[email protected] 1
1
Introduction
An important problem in image processing is segmentation of an image into disjoint regions which may possess the same average gray level but differ in the spatial distribution of gray levels (texture). Segmentation of images based on textural features is a critical preliminary operation in various image processing applications ranging from computer vision to remote sensing. In the last decade, there has been considerable interest in Bayesian estimation techniques in conjunction with Markov random fields (MRFs) for segmenting textured images. These methods use distinct stochastic processes to model textures covering each region and typically an MRF with smooth spatial behavior to model a region image. Segmentation of an image is then achieved by finding an approximate Maximum a posteriori (MAP) estimate of the unknown region image given the observed image. Recently, motivated by importance of utilizing information at various scales, a number of authors proposed multiresolution approaches to textured image segmentation, mainly to capitalize on computational efficiency of multiresolution models. Multiresolution approaches make it possible to capture and utilize correlations over a set of neighborhoods of increasing size by making use of multiresolution representations for the region image and single [1, 2] or multiresolution representations [3, 4] for the observed image. Bouman et. al. [1], used a Gaussian autoregressive model for the observed image. The MAP estimate of the region image at the coarsest resolution is approximated first, using iterated conditional modes (ICM) [5], and the result is then used as an initial condition for segmentation at the next finer level of resolution, and the process is continued until individual pixels are classified at the finest resolution. Krishnamachari et. al. [3], used a Gauss Markov random field (GMRF) to model the observed image at each resolution with the assumption that the random variables at a given resolution are independent from the random variables at other levels. It was assumed that the GMRF parameters at the finest resolution are known. The MAP estimate is approximated at each resolution using ICM first at the coarsest level and then progressively at finer levels. Comer et. al. [4], proposed a multiresolution Gaussian autoregressive model for the observed image which takes into account the correlation between adjacent levels of resolutions. It was assumed that the parameters of the Gibbs distribution of the region process are known. Segmentation is achieved there as the maximum posterior marginals (MPM) estimate rather than the MAP estimate. Most of previous multiresolution approaches, however, have been based on modeling of the region and/or observed image over the lattice structure which correspond to the Gaussian pyramid decomposition
2
[6] of the observed image. In this paper, we propose a multiresolution Bayesian approach based on modeling over the lattice structure which corresponds to the wavelet decomposition [7] of the observed image. The wavelet decomposition of an image, as opposed to the Gaussian pyramid decomposition, results in nonredundant and direction (horizontal, vertical, diagonal) sensitive features at different scales and hence affords for more selective feature extraction in space-frequency domain. It might also be easier to model the nonredundant and direction sensitive subbands of a wavelet decomposed image than to model each subband of a Gaussian pyramid decomposed image where the image at a resolution contains all the information in the lower resolutions. The proposed modeling scheme captures, over the wavelet pyramidal lattice, significant intrascale and interscale statistical dependencies in the region image and intrascale statistical dependencies in the observed image, using doubly stochastic MRFs. To estimate model parameters, a version of the Expectation-Maximization (EM) algorithm [8] is used where the Baum function is approximated using the mean-field-based decomposition of a posteriori probability of the region process [9]. The mean-fieldbased decomposition is also used in finding the MAP estimate of the region process. The paper is organized as follows. In Section 2, a doubly stochastic MRF model, usually used to model textured images, is described. The proposed multiscale stochastic model of textured images in wavelet domain is presented in Section 3. The corresponding EM-based parameter estimation and MAP-based image segmentation algorithms are given in Section 4. Simulations results are given in Section 5, followed by concluding results in Section 6.
2
Ordinary Textured Image Modeling
An image consisting of different regions of textures is usually modeled by a hierarchical Markov random field (HMRF) which consists of two layers. In the following we give a brief description of MRF and the HMRF and then introduce a typical specific model for textured images.
3
2.1
Markov Random Field
X Let L = {(i, j); 1 ≤ i, j ≤ N } denote a finite set of sites of an N × N rectangular lattice. Let ηij ⊂L
denote the (i, j)-pixel’s neighborhood of a random field XL
1
X defined on L. Let Cij denote the set of
X which contains the (i, j)-pixel, i.e., (i, j) ∈ C. For example, in the first-order cliques C associated with ηij X X = {(i, j −1), (i−1, j), (i, j +1), (i+1, j)} and Cij = {{(i, j)}, {(i, j), (i, j −1)}, {(i, j), (i− neighborhood, ηij
1, j)}, {(i, j), (i, j + 1)}, {(i, j), (i + 1, j)}} which consists of one singleton and four doubleton cliques. Let the random field XL = {Xij ; (i, j) ∈ L} be a Markov random field defined on L with Xij s taking values from a common local state space QX . It is well known that an MRF is completely described by a Gibbs distribution [10] 1 exp{−U (xL )}, ZX
p(xL ) =
(1)
×N where xL is a realization of XL from the configuration space ΩX = QN and X
U (xL ) =
U (xC )
(2)
X (i,j)∈L C∈Cij
is the global energy function whereas U (xC ) is the clique energy function and ZX =
exp{−U (xL )}
(3)
xL ∈ΩX
is the partition function. For details on MRFs and related concepts such as the neighborhoods and cliques, see Ref. [11].
2.2
A Two-Layered Hierarchical Markov Random Field
An image comprising of different textures can be considered as a realization of a collection of two interacting random variables (XL , YL ) defined on the lattice L: the region label process XL = {Xij ; (i, j) ∈ L} and the observation process YL = {Yij ; (i, j) ∈ L}
2
. The observation process is assumed to be a function of
the region label process. The interacting processes (XL , YL ) can be characterized completely by a joint probability p(xL , yL ) or equivalently by p(xL ) and p(yL | xL ). We have already specified p(xL ) in (1)-(3), thusto characterize (XL , YL ) it suffices to specify the conditional probability p(yL | xL ). Assuming that observed images are realizations of MRFs with neighborhoods 1
In this paper, xA and f (xA ) denote the set {xa1 , . . . , xal } and the multivariable function f (xa1 , . . . , xal ) respectively,
where A = {a1 , . . . , al }. 2 In order to model region boundaries, another process like the line process [11] can be added resulting in modeling by three processes. In this paper, we prefer not to use such a model for the sake of simplicity.
4
Y Y which are functions of the underlying region labels, i.e., ηij = ηij (xij ), a general expression for p(yL | xL )
can be given as p(yL | xL ) =
1 ZY |X
exp{−U (yL | xL )},
(4)
where the global conditional energy function is U (yL | xL ) =
U (yC | xij )
(5)
exp{−U (yL | xL )}.
(6)
Y (x ) (i,j)∈L C∈Cij ij
and the conditional partition function is ZY |X =
yL ∈ΩY (xL )
In (6), ΩY (xL ) =
2.3
(i,j)∈L
QY (xij ), where QY (xij ) is a set from which Yij takes a value.
A Specific Model Comprising of Multi-Level Logistic MRF and Gaussian MRFs
Consider an image consisting of M distinct textural regions. Let Xij be an indicator vector taking values from the vector set QX = {e1 , . . . , eM }, where em , for 1 ≤ m ≤ M , is the M dimensional unit vector whose mth component is 1 and all other components are 0. Then, xij takes em when the region label of (i, j)- pixel is m. To model the hidden region label process, we adopt a multi-level logistic MRF (LMRF) with the second-order neighborhood system. In this model all clique energies are assumed to be zero except for the doubleton clique energies which are given by ⎧ ⎪ ⎨ −β if xij = xij+τ U (xij , xij+τ ) = ⎪ ⎩ β otherwise.
(7)
In (7), τ ∈ N = {(0, 1), (0, −1), (1, 0), (−1, 0), (1, 1), (−1, −1), (−1, 1), (1, −1)}. For example when τ = (0, 1), xij+τ = xi,j+1 . The local conditional probability for the hidden region label process is given by exp{− τ ∈N U (xij , xij+τ )} . p(xij | xηij X) = xij ∈QX exp{− τ ∈N U (xij , xij+τ )}
(8)
The observed textures are modeled by Gaussian MRFs (GMRFs) with the second-order neighborhood systems characterized by the following local conditional probability density functions (pdfs) p(yij | yηij , xij = em ) =
1 1 exp{− [(y − µ ) − βτm (yij+τ − µm )]2 }. ij m 2σm (2πσm )1/2 τ ∈N 5
(9)
Here µm , σm and βτm stand for the mean, variance and the pair clique’s interaction parameter, all depending on the region label m. The interaction parameters (prediction coefficients) are assumed to be m , where −τ = (−i, −j) if τ = (i, j). symmetric, i.e., βτm = β−τ
3
Textured Image Modeling in Wavelet Domain
3.1
Image Modeling in Wavelet Domain
Let yL denote the original image and wL = W(yL ) the J-level wavelet transformed image. In what follows, we model the collection of two interacting random variables (xL , wL ) ,which form a doubly stochastic random process, using HMRFs. For the J-level wavelet transform, the lattice L can be decomposed as L = {LL(J ), LH(J), HL(J), HH(J), · · · , LH(1), HL(1), HH(1)},
(10)
which correspond to 3J + 1 subbands of the decomposed image wL . Here the first L or H respectively refers to a low or high frequency passband in the horizontal direction and the second L or H refers to a low or high frequency vertical passband (see Fig. 1). Using these sublattices, xL and wL can be partitioned as
3
xL
= {xLL(J ) , xLH(J) , xHL(J) , xHH(J) , · · · , xLH(1) , xHL(1) , xHH(1) },
(11)
wL
= {wLL(J ) , wLH(J) , wHL(J) , wHH(J) , · · · , wLH(1) , wHL(1) , wHH(1) }.
(12)
Considering that every hidden label process at the same scale should be identical 4 and denoting the label process at the s-th scale (level), defined on the lattice Ls = {(i, j); 1 ≤ i, j ≤ N/2s }, as x(s) we get xLL(J ) = xLH(J) = xHL(J) = xHH(J)
(J)
=
x(J) , x(J) = {xij }, (i, j) ∈ LJ ,
=
x(1) , x(1) = {xij }, (i, j) ∈ L1 .
.. . xLH(1) = xHL(1) = xHH(1)
(1)
p(xL ) can then be written as p(xL )
= p(xLL(J ) , xLH(J) , xHL(J) , xHH(J) , · · · , xLH(1) , xHL(1) , xHH(1) )
(13)
= p(x(J) , · · · , x(1) ).
(14)
3
For convenience’s sake, the indices for sublattices are put at upper position.
4
To be exact, this should be true after adjusting the origins of the coordinates of all subimages.
6
As for p(wL |xL ), considering that decomposed subimages of the same scale can constitute a vector image, (J)
i.e., wij
LL(J )
= (wij
LH(J)
, wij
HL(J)
, wij
HH(J) T
, wij
)
(s)
LH(s)
for (i, j) ∈ LJ and wij = (wij
HL(s)
, wij
HH(s) T
, wij
)
(s)
for (i, j) ∈ Ls at s = 1, · · · , J − 1, and assuming that w(s) = {wij ; (i, j) ∈ Ls } is dependent on the corresponding hidden label process x(s) , it can be written as p(wL |xL )
= p(wLL(J ) , wLH(J) , wHL(J) , wHH(J) , · · · , wLH(1) , wHL(1) , wHH(1) |x(J) , · · · , x(1) ) (15) = p(wLL(J ) , wLH(J) , wHL(J) , wHH(J) |x(J) )
J−1
p(wLH(s) , wHL(s) , wHH(s) |x(s) )
(16)
s=1
=
J
p(w(s) |x(s) ).
(17)
s=1
3.2
Multi-Level Logistic MRF and Gaussian MRFs in Wavelet Domain
To model the hidden region label process in the wavelet domain, which has a multiscale structure composed of x(J) , · · · , x(1) , we adopt a multi-level logistic MRF (LMRF) with interscale neighborhood as well as intrascale neighborhood. The chosen neighborhood is the second-order neighborhood for the intrascale neighborhood and, one parent and four children for the interscale neighborhood. The neighborhood of X the (i, j)-pixel at the s-th scale, ηij X ηij
(s)
(s)
, is explicitly described as follows
= {(i, j +1)(s) , (i, j −1)(s) , (i+1, j)(s), (i−1, j)(s) , (i+1, j +1)(s), (i−1, j −1)(s), (i−1, j +1)(s), (i+1, j −1)(s), ([(i+1)/2], [(j +1)/2])(s+1), (2i−1, 2j −1)(s−1), (2i−1, 2j)(s−1), (2i, 2j −1)(s−1) , (2i, 2j)(s−1) }, (18)
where the indices (s), (s + 1) and (s − 1) show the scale levels and [·] shows the function to produce the greatest smaller integer. In this model, clique energies are given only for doubleton cliques as ⎧ ⎪ ⎨ −βs if x(s) = x(s) ij ij+τ (s) (s) U (xij , xij+τ ) = ⎪ ⎩ βs otherwise,
(19)
for the intrascale cliques, where τ ∈ N = {(0, 1), (0, −1), (1, 0), (−1, 0), (1, 1), (−1, −1), (−1, 1), (1, −1)}, as (s) (s+1) U (xij , xip jp )
=
⎧ ⎪ ⎨ −βs,s+1
if xij = xip jp
⎪ ⎩ βs,s+1
otherwise,
7
(s)
(s+1)
(20)
for the interscale (to parent) clique, where ip = [(i + 1)/2], jp = [(j + 1)/2], and as ⎧ ⎪ ⎨ −βs−1,s if x(s) = x(s−1) ij (2i,2j)+τc (s) (s−1) U (xij , x(2i,2j)+τc ) = ⎪ ⎩ βs−1,s otherwise,
(21)
for the interscale (to child) clique, where τc ∈ Nc = {(−1, −1), (−1, 0), (0, −1), (0, 0)}. Correlations of region labels are considered through these clique energy parameters; for example, a larger value of βs > 0 represents a stronger tendency that region labels for members of relevant intrascale clique are the same. (s)
Using these clique energies, the local conditional probability of xij in the wavelet domain is given by (s) (s) (s) (s+1) (s) (s−1) exp{−[ τ ∈N U (xij , xij+τ ) + U (xij , xip jp ) + τc ∈Nc U (xij , x(2i,2j)+τc )]} (s) p(xij | xηX (s) ) = . (s) (s) (s) (s+1) (s) (s−1) ij exp{−[ τ ∈N U (xij , xij+τ ) + U (xij , xip jp ) + τc ∈Nc U (xij , x(2i,2j)+τc )]} x(s) ∈Q X ij (22) Note that the clique energies in (20) are removed at the coarsest J-th scale and those in (21) removed at the first scale. The vector image w(s) at the s-th scale in (17) is modeled by multidimensional GMRFs with the second-order neighborhood systems characterized by the following local conditional pdfs (s)
(s)
1 1 (s) (s),m (s) (s),m −1 ˆ ij )T (Σ(s) ˆ ij )},(23) exp{− (wij − w (wij − w m ) 1/2 2 (2π)K/2 |Σ(s) | m (s) (s) = µ(s) + B(s) (24) m,τ (wij+τ − µm ). m
p(wij | wηW (s) , xij = em ) = ij
(s),m
ˆ ij w
τ ∈N (s)
(s),m
ˆ ij Here K denotes the dimension of wij (four for s = J and three for s = 1, · · · , J − 1), w
denotes the
(s)
(s) predicted vector using neighboring vectors wηW (s) , and µm , Σ(s) m and Bm,τ stand for the mean vector, ij
(s)
(s),m
ˆ ij covariance matrix of the prediction error vectors (wij − w
) and spatial interaction parameter matrix (s)
(s)
for pairwise cliques, all depending on the class label m. The correlation between wij and wij+τ is considered through B(s) m,τ . The spatial interaction parameter matrix is reminiscence of the prediction coefficient matrix in the linear vector prediction and therefore can be simply referred as the prediction (s)
matrix. The prediction matrices are assumed to be symmetric, i.e., B(s) m,τ = Bm,−τ .
4
Unsupervised Segmentation Algorithm
We have already proposed an unsupervised textured image segmentation method where the mean-fieldbased decomposition of a posteriori probability is used for parameter estimation and image segmentation[9]. Image segmentation for wavelet transformed images can follow the method for original images, although the procedures for wavelet transformed images become more complex. 8
4.1
Parameter Estimation
The EM method [8] is an iterative method to perform the maximum likelihood (ML) estimation with incompletely observed data. It is considered that the observed image wL , the J-level wavelet transformed image, is incomplete data and the set consisting of the observed image wL and the unobservable region label image xL (in fact a set of region indicator vectors, xL ) is complete data. The EM method consists of the expectation step (E-step to obtain the Baum function) and the maximization step (M-step): • E-step:
Q(Λ | Λ(p) ) =
p(xL | wL ; Λ(p) ) log p(xL , wL ; Λ)
(25)
xL ∈ΩX • M-step: Λ(p+1) = arg max Q(Λ | Λ(p) )
(26)
Λ
Here Λ = {λX , λW (s) ; s = 1, · · · , J} represents the set of all parameters to be estimated. λX = {βs , βt,t+1 ; s = (s)
(s) 1, · · · , J, t = 1, · · · , J−1} is the set of MRF parameters of the region process and λW (s) = {µm , Σ(s) m , Bm,τ ; τ ∈
N , m = 1, · · · , M } is the set of GMRF parameters for w(s) . Λ(p) is a provisionally estimated set of Λ at the p-th iteration. The Baum function in (25) represents the sum over all possible configurations of xL 5 , and it is difficult (practically impossible) to calculate this. To overcome this problem, we use the mean-field-based decomposition of a posteriori probability to calculate the Baum function [9]. The a posteriori probability p(xL | wL ; Λ(p) ) in (25) can be written as p(xL | wL ; Λ(p) ) =
p(wL | xL ; Λ(p) )p(xL ; Λ(p) ) . (p) (p) xL ∈ΩX p(wL | xL ; Λ )p(xL ; Λ )
(27)
Using the mean field approximation, p(wL | xL ; Λ(p) ) and p(xL ; Λ(p) ) are decomposed as [12] p(wL | xL ; Λ(p) ) =
J s=1
J
(p)
p(w(s) |x(s) ; λW (s) )
(s)
s=1 (i,j)∈Ls
(s)
(p)
p(wij | wηW (s) , xij ; λW (s) ),
(28)
ij
(p)
p(xL ; Λ(p) ) = p(x(J) , · · · , x(1) ; λX )
J
(s)
s=1 (i,j)∈Ls
5
(p)
p(xij | xηX (s) ; λX ), ij
|L |···|L1 |
Since xL = {x(J ) , · · · , x(1) }, ΩX in (25) becomes equal to QX J
9
.
(29)
where • denotes the mean field for •. Substituting (28) and (29) into (27) and replacing J by s=1 (i,j)∈Ls x(s) ∈QX , we get the following decomposition for p(xL | wL ; Λ(p) ) ij J
p(xL | wL ; Λ(p) )
(s)
xL ∈ΩX
(s)
p(xij | wij , wηW (s) , xηX (s) ; Λ(p) ), ij
s=1 (i,j)∈Ls
J
s=1
(i,j)∈Ls
(30)
ij
where (s)
(s) p(xij
|
(s) wij , wηW (s) , xηX (s) ; Λ(p) ) ij
(s)
ij
(s)
(p)
(s)
(p)
p(wij | wηW (s) , xij ; λW (s) )p(xij | xηX (s) ; λX ) ij
ij
. (31) = (s) (s) (p) (s) (p) p(wij | wηW (s) , xij ; λW (s) )p(xij | xηX (s) ; λX ) ∈Q x(s) X ij ij ij
(s)
p(xij | wij , wηW (s) , xηX (s) ; Λ(p) ) is considered as a local a posteriori probability (LAP) and hereij
ij
(s),(p)
after we write it as zij (s),(p)
vector), zij (s),(p)
zij
(s),(p)
= (zij
(s)
(xij ) for short. Then the LAPs for all region indicators form a vector (LAP (s)
(s),(p)
(xij = e1 ), · · · , zij (s)
(s)
(xij = eM ))T . It is reasonable to use the LAP vector
(s)
(s),(p)
as the mean field of xij , xij [9]. Then, using (31) zij (s)
(s),(p)
zij
(s)
(p)
(s)
X ηij
ij
(s)
(p)
; λX )
(xij ) = , (s) (s) (p) (s) (p) (p) p(w | w W (s) , xij ; λW (s) )p(xij | z X (s) ; λX ) ij ∈Q x(s) η X ηij ij ij
(p)
ij
(s),(p)
(s)
p(wij | wηW (s) , xij ; λW (s) )p(xij | z
here wηW (s) is simply used for wηW (s) and z zij
(p)
(s)
(xij ) is computed by
ij
(s)
, the LAP vector for xij , we need z
X ηij (p)
X ηij
(s)
(s)
(p)
X = {zkl , (k, l) ∈ ηij
(s)
(32)
}. Note that in order to calculate
, those for xηX (s) . Therefore the LAP vectors can be ij
(s)
calculated by iterative procedures popular in numerical analysis. In p(xij | z
(p) X ηij
(s)
(p)
; λX ) the doubleton
clique energies in (19),(20) and (21) should be changed as (s)
(s)
U (xij , zij+τ ) = = (s)
(s+1)
U (xij , zip jp ) = (s)
(s−1)
U (xij , z(2i,2j)+τc ) =
(s)
(s)
(s)
(s)
−βs (xij )T zij+τ + βs (1 − (xij )T zij+τ ) (s)
(s)
βs (1 − 2(xij )T zij+τ ),
(33)
(s)
(s+1)
(s)
(s−1)
βs,s+1 (1 − 2(xij )T zip jp ),
(34)
βs−1,s (1 − 2(xij )T z(2i,2j)+τc ).
(35)
Now we can approximately calculate the Baum function with the mean-field-based decomposition of (s),(p) (s) (xkl ), and with the same decomposition for log p(xL , wL ; Λ). p(xL | wL ; Λ(p) ), Js=1 (k,l)∈Ls zkl Q(Λ | Λ
(p)
)
J
xL ∈ΩX s=1 (k,l)∈Ls {
J
J s=1 (i,j)∈Ls
(s)
(xkl ) ·
(s)
s=1 (i,j)∈Ls
=
(s),(p)
zkl
(s)
log p(wij | wηW (s) , xij ; λW (s) ) + ij
x(s) ∈QX ij
(s),(p)
zij
(s)
J s=1 (i,j)∈Ls
(s)
(s)
(s)
(xij ) log p(wij | wηW (s) , xij ; λW (s) ) + ij
10
(p)
log p(xij | z
X ηij
(s)
; λX )}
J
s=1 (i,j)∈Ls
(s),(p)
zij
(s)
(s)
(p)
(xij ) log p(xij | z
X ηij
(s) ∈QX ij
x
(s)
; λX )
(36)
Once the Baum function is obtained, the M-step is carried out straightforwardly as follows. (p+1)
(p+1)
λW (s)
arg max{
=
λX
arg max {
=
λW (s)
λX
(i,j)∈Ls J
(s),(p)
zij
(s)
(s)
(s)
(xij ) log p(wij | wηW (s) , xij ; λW (s) )}
(s) ∈QX ij
x
s=1 (i,j)∈Ls
(37)
ij
(s),(p)
zij
(s)
(s)
(p)
(xij ) log p(xij | z
X ηij
(s) ∈QX ij
x
(s)
; λX )}
(38)
The provisional estimate of the LMRF’s parameters λX in (33)-(35), by (38), cannot be given in a mathematically closed form and needs to be calculated using an appropriate numerical optimization method. In the following experiments we used the Newton method to estimate λX . The provisional estimate of GMRFs’ parameters λW (s) , by (37), can be estimated in a mathematically closed form as follows. The (s),(p+1)
reestimate of the mean vector µm
(s),(p+1) µm ={
is given as
(s),(p)
zij
(i,j)∈Ls (s),(p)
where zij
(s),(p)
(m) represents zij
(s)
(m)wij }/{
(s),(p)
zij
(m)},
(39)
(i,j)∈Ls
(s)
(s),(p+1)
(s),(p+1) (xij = em ). Assuming Bm,τ = Bm,−τs s
, the reestimates of the
(s),(p+1) can be obtained by solving the following matrix equation prediction matrices Bm,τ s ⎛ ⎞ (s) (s) (s) (s) A A A A 12 13 14 ⎟ ⎜ 11 ⎜ ⎟ ⎜ (s) (s) (s) (s) ⎟ A A A A ⎜ 21 22 23 24 ⎟ (s),(p+1) (s),(p+1) (s),(p+1) (s),(p+1) ⎜ ⎟ = (A(s) , A(s) , A(s) , A(s) ), (Bm,τ , Bm,τ , Bm,τ , Bm,τ )⎜ 01 02 03 04 ⎟ 1 2 3 4 ⎜ A(s) A(s) A(s) A(s) ⎟ 32 33 34 ⎟ ⎜ 31 ⎝ ⎠ (s) (s) (s) (s) A41 A42 A43 A44
(40) (s)
Ast = {
(s),(p)
zij
(s)
(s)
(s)
(s)
(s),(p+1) (s),(p+1) T (m)(wij+τs + wij−τs − 2µm )(wij+τt + wij−τt − 2µm ) }/{
(i,j)∈Ls
(s),(p)
zij
(m)},
(i,j)∈Ls
(41) (s)
A0t = {
(s),(p)
zij
(s)
(s)
(s)
(s),(p+1) (s),(p+1) T (m)(wij − µm )(wij+τt + wij−τt − 2µm ) }/{
(i,j)∈Ls
(s),(p)
zij
(m)},
(42)
(i,j)∈Ls
where τs ,τt ∈ N = {(0, 1), (1, 0), (1, 1), (−1, 1)} (if s = t then τs = τt ). For example τ1 = (0, 1), τ2 = (1, 0), (s),(p+1) is given as τ3 = (1, 1), τ4 = (−1, 1). Finally the reestimate of the covariance matrix Σm (s),(p) (s) (s),m (s) (s),m T ˆ ij )(wij − w ˆ ij ) (m)(wij − w (i,j)∈Ls zij (s),(p+1) Σm = , (s),(p) (m) (i,j)∈Ls zij (s),m
ˆ ij w
=
(s),(p+1) µm +
4 s=1
(s)
(s)
(s),(p+1) (s),(p+1) Bm,τ (wij+τs + wij−τs − 2µm ). s
11
(43)
(44)
4.2
Image Segmentation
Segmentation of an image into regions of different textures amounts to estimation of the hidden region process xL . In principle it is carried out with the finally estimated parameters. However a provisional segmentation is also possible with provisionally estimated parameters at each iteration in the EM method. In the following we describe the segmentation in this situation. Given a wavelet transformed image wL and an estimated parameter set Λ(p) , the provisional estimation of xL is carried out by maximizing the a posteriori probability p(xL | wL ; Λ(p) ) (MAP estimation)6 . (p)
xL
=
arg max p(xL | wL ; Λ(p) ) xL ∈ΩX
(45)
As seen from (30), (31) and (32), this global optimization problem is approximately decomposed into the (s),(p)
local optimization problems using the LAPs zij (s),(p)
xij
(s)
(s)
(xij )s for xij s (s),(p)
= arg max zij x(s) ∈QX ij
(s)
(xij ).
(46)
Since segmentation in a finer resolution is generally favorable, the estimated region image at the first scale, (1),(p)
xij
can be used as a segmentation result. As a better alternative, the averaged LAP over all scales (1),(p) (1) z ij (xij )
J 1 (s),(p) (s) z (xis js ), J s=1 is js
=
(47) (1),(p)
where (i1 , j1 ) = (i, j), is = [(i − 1)/2s−1 + 1] and js = [(j − 1)/2s−1 + 1], can be used to obtain xij (1),(p)
In the following experiments, the averaged LAP z ij
4.3
.
(1)
(xij ) is utilized.
Initial Parameter Estimation
To start iterative procedures in the EM method, initial values of MRF parameters should be given in advance. These initial values are derived as follows. A vector image at each scale, w(s) is divided into small blocks, for example, consisting of 8×8 pixels. Assuming a single texture for each block, a set of texture parameters (composing a vector) is estimated. Assuming that the number of regions is known, these parameter vectors derived from all blocks are classified into the known number of regions by using a clustering method. Then the texture parameters for each different region are again estimated using all (0)
blocks classified to the same region and are used as initial texture parameters λW (s) . Appropriate positive 6
To solve this optimization problem, the stochastic relaxation algorithm known as the Simulated Annealing (SA) [11] can
be used. However, we prefer not to use the SA since it demands formidable computation.
12
values can be used as initial region parameters, {βs , βt,t+1 ; s = 1, · · · , J, t = 1, · · · , J − 1} in (33)-(35). In the following experiments we use 0.5 for all βs s and βt,t+1 s.
5
Simulation Results
To evaluate the performance of the proposed unsupervised segmentation method, we applied the method to seven 256×256 images shown in Figs. 2(a)-8(a). Among them, Figs. 2(a)-6(a) are synthesized textured images consisting of three natural textures from the Brodatz album [13]. Figs. ∗(b)s are derived segmentation results using given original images, ∗(c)s are those using only the LL(1) subimage wLL(1) , ∗(d)s are those using four decomposed subimages of the 1-level wavelet transform, and ∗(e)s those using seven decomposed subimages of the 2-level wavelet transform. For synthesized images, for which true region label images are available, segmentation error rates can be calculated as the number of misclassified pixels over the total number of pixels and each error rate is written at the bottom of the corresponding figure. We believe that error mainly comes from lack of modeling capability to model textured images, i.e., if statistical correlations in textured images are appropriately taken into account in modeling, segmentation performance should be improved. From bad segmentation results shown in Figs. ∗(b)s, it is seen that the resolution of these original images is too fine; these original images are too coarse to be modeled accurately by the GMRF with the second-order neighborhood. Using the LL(1) subimage, i.e., down-sampled image, some textured images such as the one given in Fig. 2(a) can be segmented fairly well, but general segmentation results are still bad (See Figs. ∗(c)s). Comparing the results shown in Figs. ∗(d)s with those in Figs. ∗(b)s and Figs. ∗(c)s, it is seen that the use of a set of wavelet decomposed images generally improves segmentation performance. Furthermore, by comparing the results shown in Figs. ∗(e)s with those in Figs. ∗(d)s, we can see that it is very effective to consider statistical correlations over multiple scales by using multi-level wavelet decomposed images. Segmentation results for real images are shown in Figs. 7 and 8 for several numbers of regions (the number of regions M = 2, 3 and 4) 7 . Although performance evaluation is difficult for real images without true region labels, the segmentation results subjectively seem to show a similar tendency for synthesized
7
Estimation of M is another important problem but it is beyond the scope of this paper. It might be possible by
information-theoretic approaches [15].
13
images. At least we may be able to say that the results using multi-level wavelet decomposed images shown in Figs. ∗(e)s are better than others from the viewpoint of textured image segmentation
6
8
.
Conclusions
In this paper, we presented a wavelet-based multiresolution Bayesian approach to the problem of segmenting textured images. The approach makes use of the modeling power of MRF models and the multiscale and highly discriminative nature of the wavelet representation, and underlies a multiscale segmentation algorithm which, as shown by experimental results, uses effectively the statistical regularities over multiple scales. Incorporating interscale correlations, which exist over wavelet decomposition of the observed image, into the model and the corresponding segmentation algorithm is an issue for further research. Extension to Markov modeling over wavelet packet decomposition [14] is another interesting issue to be addressed.
Acknowledgments This work was partly supported by Basic Research 21 for Breakthroughs in Info-communication Project of Japan Ministry of Posts and Telecommunications.
References [1] C.A. Bouman and B. Liu, Multiple resolution segmentation of textured images, IEEE Trans. Pattern Anal. Machine Intell., 13 (2) (1991) 99-113. [2] C. A. Bouman and M. Shapiro, A multiscale random field model for Bayesian image segmentation, IEEE Trans. Image Process., 3 (2) (1994) 162-177. [3] S. Krishnamachari and R. Chellappa, Multiresolution Gauss Markov random field models, Tech. Rep., Univ. Maryland, Collage Park, 1995.
8
In this paper, images are assumed to be composed of multiple textures. Therefore non or less textured regions in real
images are likely to be segmented into one region even though they differ in average gray level.
14
[4] M. L. Comer and E. J. Delp, Segmentation of textured images using a multiresolution Gaussian autoregressive model, IEEE Trans. Image Process., 8 (3) (1999) 408-418. [5] J.E. Besag, On the statistical analysis of dirty pictures, J. Roy. Statis., Soc. B, 48 (3) (1986) 259-302. [6] P. J. Burt and E. H. Adelson, The Laplacian pyramid as a compact image mode, IEEE Trans. Commun., COMM-31 (1983) 532-540. [7] C.K. Chui, An Introduction to Wavelets, Academic Press, Boston, 1992. [8] A.P. Dempster, N.M. Laird and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc., 39 (1) (1977) 1-38. [9] H. Noda, M.N. Shirazi, B. Zhang and E. Kawaguchi, Mean field decomposition of a posteriori probability for MRF-based unsupervised textured image segmentation, Proc. ICASSP’99, Vol.6, 1999, pp.3477-3480. [10] F. Spitzer, Markov random fields and Gibbs ensembles, Amer. Math. Mon., 78 (1971) 142-154. [11] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. & Machine Intell., PAMI-6 (1984) 721-741. [12] H. Noda and M.N. Shirazi, Data-driven segmentation of textured images using hierarchical Markov random fields, Trans. IEICE, J77-D2 (5) (1994) 922-930(in Japanese. An English translated version is available in Systems and Computers in Japan, 259 (5) (1995) 43-53.). [13] P. Brodatz, Textures - A Photographic Album for Artists and Designers, Dover, New York, 1966. [14] T. Chang and C.-C. J. Kuo, Texture analysis and classification with tree-structured wavelet transform, IEEE Trans. Image Process., 2 (4) (1993) 429-441. [15] C.S. Won and H. Derin, Unsupervised segmentation of noisy and textured images using Markov random fields, CVGIP: Graphical Models & Image Process., 54 (4) (1992) 308-328.
15
About the Author–HIDEKI NODA received B.E. and M.E. in electronics engineering from Kyushu University, Japan, in 1973 and 1975 respectively, and Dr. Eng. degree in electrical engineering from Kyushu Institute of Technology, Japan, in 1993. He worked in Daini-Seikosha Ltd. from 1975 to 1978, in National Research Institute of Police Science, Japan National Police Agency from 1978 to 1989 and then in Communications Research Laboratory, Japan Ministry of Posts and Telecommunications from 1989 to 1995. In 1995, he moved to Kyushu Institute of Technology where he is now an associate professor in the Department of Electrical, Electronic & Computer Engineering. His research interests include speaker and speech recognition, image processing and neural networks.
About the Author–MAHDAD NOURI SHIRAZI was born in 1963 in Iran. He recieved his M.Sc and Ph.D degrees in electrical engineering from Tottori University and Kobe University, Japan, respectively. In 1993 he became a post-doctoral research fellow at the Communications Research Laboratory of Japan Ministry of Posts and Telecommunications, funded by the Japan Science and Technology Agency. Since 1995 he has been at the same laboratory, currently as a senior research scientist. His research interests include neural networks, pattern recognition, and image processing.
About the Author–EIJI KAWAGUCHI received the Dr. Eng. degree from the Department of Electronics Engineering at Kyushu University, Japan in 1971. Currently, he is a professor of Computer Engineering at Kyushu Institute of Technology. His research interests include speech recognition, pattern understanding, image processing and knowledge engineering as well as natural language understanding and semantic modeling.
16
Captions Figure 1. An example of 2-level wavelet transform: (a) original image, (b) wavelet transformed image, (c) sublattices.
Figure 2. Segmentation results: (a) given image, (b) segmented image by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
Figure 3. Segmentation results: (a) given image, (b) segmented image by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
Figure 4. Segmentation results: (a) given image, (b) segmented image by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
Figure 5. Segmentation results: (a) given image, (b) segmented image by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
Figure 6. Segmentation results: (a) given image, (b) segmented image by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
Figure 7. Segmentation results: (a) given image, (b) segmented images for the number of regions M = 2, 3 and 4 by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
Figure 8. Segmentation results: (a) given image, (b) segmented images for the number of regions M = 2, 3 and 4 by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
17
LL(2) HL(2)
HL(1) LH(2) HH(2)
LH(1)
(a)
(b)
HH(1)
(c)
Figure 1: An example of 2-level wavelet transform: (a) original image, (b) wavelet transformed image, (c) sublattices.
18
(a)
(b) error=27.3%
(c) error=3.3%
(d) error=2.4%
(e) error=2.0%
Figure 2: Segmentation results: (a) given image, (b) segmented image by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
19
(a)
(b) error=34.3%
(c) error=32.0%
(d) error=15.9%
(e) error=4.3%
Figure 3: Segmentation results: (a) given image, (b) segmented image by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
20
(a)
(b) error=33.9%
(c) error=19.9%
(d) error=14.5%
(e) error=1.8%
Figure 4: Segmentation results: (a) given image, (b) segmented image by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
21
(a)
(b) error=12.4%
(c) error=8.4%
(d) error=5.5%
(e) error=3.9%
Figure 5: Segmentation results: (a) given image, (b) segmented image by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
22
(a)
(b) error=38.8%
(c) error=26.1%
(d) error=14.4%
(e) error=4.7%
Figure 6: Segmentation results: (a) given image, (b) segmented image by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
23
M=2
M=3
M=4 (a)
(b)
(c)
(d)
(e)
Figure 7: Segmentation results: (a) given image, (b) segmented images for the number of regions M = 2, 3 and 4 by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
24
M=2
M=3
M=4 (a)
(b)
(c)
(d)
(e)
Figure 8: Segmentation results: (a) given image, (b) segmented images for the number of regions M = 2, 3 and 4 by using given original image, (c) by only LL(1) image, (d) by 1-level decomposed images, and (e) by 2-level decomposed images.
25