two frontiers in morphological image analysis - CVSP - NTUA

Comment

Report 3 Downloads 76 Views

TWO FRONTIERS IN MORPHOLOGICAL IMAGE ANALYSIS: DIFFERENTIAL EVOLUTION MODELS AND HYBRID MORPHOLOGICAL/LINEAR NEURAL NETWORKS ∗ ´cio F. C. Pessoa3 Petros Maragos1&2 , M. Akmal Butt1 , and Lu 1

School of E.C.E., Georgia Institute of Technology, Atlanta, GA 30332, USA. [maragos,akmal,lpessoa]@ee.gatech.edu 2 Institute for Language and Speech Processing, Artemidos/Epidavrou Str., Marousi 15125, Greece. 3 Motorola Semiconductor Products Sector, Austin, TX 78721, USA. Abstract. In this paper we brieﬂy describe advancements in two broad areas of morphological image analysis. Part I deals with diﬀerential morphology and curve evolution. The partial diﬀerential equations (PDEs) that model basic morphological operations are ﬁrst presented. The resulting dilation PDE, numerically implemented by curve evolution algorithms, improves the accuracy of morphological multiscale analysis by Euclidean disks and (its anisotropic/heterogeneous version) is the basic ingredient of PDE models that solve image analysis problems such as gridless halftoning and watershed segmentation based on the eikonal PDE. Part II deals with morphology-related systems for pattern recognition. It presents a general class of multilayer feed-forward neural networks where the combination of inputs in every node is formed by hybrid linear and nonlinear (of the morphological/rank type) operations. For its design a methodology is formulated using ideas from the back-propagation algorithm and robust techniques are developed to circumvent the non-diﬀerentiability of rank functions. Experimental results in handwritten character recognition are described and illustrate some of the properties of this new type of neural nets. Keywords: diﬀerential morphology, PDEs, curve evolution, neural nets, character recognition.

1

PART I: DIFFERENTIAL MORPHOLOGY PDEs have the form AND CURVE EVOLUTION ∂Ψ/∂s = ±||∇Ψ||

Morphological image processing has been based traditionally on set and lattice theory. Thus, so far, the two classic approaches to analyze or design deterministic morphological operators have been: (i) geometry by viewing them as image set transformations in Euclidean spaces and (ii) algebra to analyze their properties using set or lattice theory. In parallel to these directions, there is a recently growing part of morphological image processing that uses tools from diﬀerential calculus and dynamical systems to model nonlinear multiscale analysis and distance propagation in images. Recently, the multiscale morphological operators of dilation, erosion [1, 5, 24] and opening, closing [5] were modeled via nonlinear partial diﬀerential equations (PDEs) acting in scale-space. These advancements were inspired by previous work in computer vision where multiscale linear convolutions of an image were modeled via the heat PDE. For multiscale ﬂat dilations and erosions of an image f (x, y) by compact convex symmetric structuring sets B⊆ IR2 at a continuum of scales s ≥ 0, their generating ∗ This work was done mostly at Georgia Tech and was supported by the US NSF under grant MIP–94-21677 and by the ARO under grant DAAH04-96-1-0161. L. Pessoa was also supported by CNPq (Conselho Nacional de Desenvolvimento Cient´ıﬁco e Tecnol´ ogico), Bras´ılia, Brazil, through a Doctoral Fellowship under grant 200.846/92-2. The paper was written while P. Maragos was at I.L.S.P.

B

,

Ψ(x, y, 0) = f (x, y)

(1)

where Ψ(x, y, s) is the dilation ⊕ or erosion of f by sB, +/− corresponds to dilation/erosion respectively, ||(x, y)||B ≡ sup(a,b)∈B (ax + by), ∇Ψ ≡ (Ψx , Ψy ) is the spatial gradient, and Ψx ≡ ∂Ψ/∂x denote partial derivatives. For example, if B is the unit disk, || · ||B is the Euclidean norm || · || and ∂Ψ/∂s = ±||∇Ψ|| = ±

(Ψx )2 + (Ψy )2

(2)

Even if the initial image f is smooth, at ﬁnite scales s > 0 the above dilation or erosion evolution may create discontinuities in the derivatives, like ‘shocks’. Thus, the dilations f ⊕sB or erosions f sB are weak solutions of (1). Ways to deal with these shocks include replacing standard derivatives with morphological derivatives [5, 11] or replacing the PDEs with diﬀerential inclusions [13]. In parallel to the development of the above ideas, there have been some advances in the ﬁeld of diﬀerential geometry for evolving curves or surfaces using level set methods [17, 23]. Consider at time t = 0 an initial simple, smooth, closed planar curve Γ(0) which is propagated for t > 0 along its normal vector ﬁeld with speed c. Let this evolving curve Γ(t) be represented by its position vector X(p, t) = (x(p, t), y(p, t)) and parameterized by p ∈ J so that it has its interior on the left in the direction of increasing p. A general propagation law is ∂ X(p, t) (p, t) , =cN ∂t

Γ(0) = {X(p, 0) : p ∈ J}, (3)

Invited Paper. Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Brazil, Oct. 1998.

11

FRONTIERS IN MORPHOLOGICAL IMAGE ANALYSIS (p, t) is the instantaneous unit outward normal where N vector at points on the evolving curve, and c = c(x, y, t) is the speed function which generally depends on local geometrical information such as the curvature κ(p, t), global image properties, or other factors independent of the curve. If c = 1 or c = −1, then Γ(t) is the dilation or erosion of the initial curve Γ(0) by a disk of radius t. The speed model c = 1±εκ, |ε| ≤ 1, has been extensively studied in [17, 23] for general evolution of boundaries and interfaces and in [8] for shape analysis in computer vision. To overcome the topological problem of splitting and merging and numerical problems with the Lagrangian formulation (3), an Eulerian formulation was proposed in [17] where the original curve Γ(0) is ﬁrst embedded in the surface of an arbitrary 2D Lipschitz continuous function Φ0 (x, y) as its zero-level curve. (For example, we select Φ0 (x, y) to be equal to the signed (±) distance function from the boundary of Γ(0) where + is for points inside and − is for points outside the curve.) Then, the evolving 2D curve is obtained as the zero-level curve Γ(t) = {(x, y) : Φ(x, y, t) = 0} of a 2D function Φ(x, y, t) that evolves according to the PDE ∂Φ/∂t = c||∇Φ|| ,

Φ(x, y, 0) = Φ0 (x, y)

(4)

This function evolution PDE makes all level sets expand at normal speed c. If c = ±1, it is identical to the ﬂat circular dilation/erosion PDE (2) by equating scale s with time t.

1.1

Multiscale Analysis via Dilation PDE

Many applications of mathematical morphology [22, 12] such as nonlinear smoothing, geometrical feature extraction, skeletonization, size distributions, and segmentation, inherently require or can beneﬁt from performing morphological image operations at multiple scales, which creates a morphological scale-space. For binary images, the distance transform is a compact way to represent their multiscale dilations and erosions by convex structuring elements whose shape depends upon the norm used to measure distances. Thresholding the distance transform at level t > 0 yields the morphological erosion of the image foreground (or the dilation of the background) by the norm-induced ball of radius (scale) t. To obtain isotropic distance propagation, the Euclidean distance transform is desirable because it gives multiscale morphology with the disk as the structuring element. However, it has a signiﬁcant computational complexity. Discrete approaches use various techniques to obtain integer approximations to the Euclidean distance transform at a lower complexity. Notable such examples are the chamfer metrics [3], computed by running recursive min-sum diﬀerence equations over the image and thus propagating local distances within a neighborhood mask. Their associated unit ball is a polygon whose approximation of the disk improves by increasing the size of the mask and optimizing the local distances. In this paper, for optimal chamfer transforms we shall use the local distances found in [6].

The continuous approach uses the dilation PDE (2). This applies to both graylevel and binary images, because ﬂat dilations commute with thresholding and hence, when a graylevel image is dilated, each one of its thresholded versions representing a binary image is simultaneously dilated by the same element and at the same scale. Thus, the dilation PDE plays a fundamental role for modeling morphological scale-space. However, its usefulness is greatly ampliﬁed by the existence of a stable and shock-capturing algorithm [17] for the numerical implementation of (4). Its main steps are : • Let Φn i,j be an estimate of Φ(i∆x, j∆y, n∆t) on a grid. n − n n • Dx+ = (Φn i+1,j − Φi,j )/∆x , Dx = (Φi,j − Φi−1,j )/∆x + n n − n • Dy = (Φi,j+1 − Φi,j )/∆y , Dy = (Φi,j − Φn i,j−1 )/∆y • H 2 = min2 (0, Dx− ) + max2 (0, Dx+ ) + min2 (0, Dy− ) + max2 (0, Dy+ ) n • Φn+1 = Φn , n = 1, 2, ..., (Tmax /∆t) i,j + Ci,j |H|∆t i,j where Tmax is the maximum time (scale) of interest, ∆x, ∆y are the spatial grid spacings, ∆t is the time (scale) n step, and Ci,j = c(i∆x, j∆y, n∆t). Thus, by choosing ﬁne grids (and possibly higher order terms) an arbitrarily low error (between signal values on the continuous plane and the discrete grid) can be achieved in implementing morphological operations involving disks as structuring elements. This is a signiﬁcant advantage of the PDE approach, as observed in [2, 20]. Thus, curve evolution provides a geometrically better implementation of multiscale morphological operations with the disk-shaped structuring element. See Fig. 1 for an example.

1.2

Eikonal PDE

Many tasks for extracting information from visible images have been related to eikonal optics and wave propagation via the eikonal PDE [4] ||∇U (x, y)|| = η(x, y)

(5)

Its solution U (x, y) can provide 3D shape, contour halftoning, or topographic segmentation of an image f (x, y) by choosing the refractive index ﬁeld η(x, y) to be an appropriate function of the image [7, 25, 18, 16]. The eikonal PDE can be seen as a stationary formulation of the function evolution PDE (4) with positive speed c(x, y) = c0 /η(x, y). Namely [17], if T (x, y) is the time at which the zero level of Φ(x, y, t) crosses (x, y), then ||∇T || = 1/c. Setting U = c0 T leads to the eikonal. The solution of the eikonal PDE can be viewed as a weighted distance function [4, 25, 19, 9, 11] between a point (x, y) and the sources along a path of minimal optical length. The optical length of any path is obtained by integrating the refractive index ﬁeld η(x, y) along this path and is proportional to the time required for light to travel this path. Thus, we can view the solution U (x, y) to the eikonal as a gray-weighted distance transform (GWDT) whose values at each pixel give the minimum distance from the light sources weighted by the gray values of the refractive index ﬁeld. Next we outline two ways of solving the eikonal PDE.

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

12

P. Maragos, A. Butt, L. Pessoa

(a) (b) (c) (d) Figure 1. Distance transforms (modulo a constant) of a binary image, obtained via: (a) (1,1) chamfer metric; (b) optimal 3 × 3 chamfer metric; (c) optimal 5 × 5 chamfer metric; (d) curve evolution.

1.2.1

GWDT based on Chamfer Metrics

Consider a sampled refractive index ﬁeld η[x, y] (viewed as a positive image) and a set S of sources. A discrete GWDT ﬁnds at each pixel p = [x, y] the smallest sum of values of η over all possible paths connecting p to the sources S. This can also be viewed as a procedure of ﬁnding paths of minimal ‘cost’ among nodes of a weighted graph or as discrete dynamic programming. Such a discrete GWDT (which is an approximation [25, 11] to the solution of the eikonal PDE ||∇U || = η) can be obtained via the following 2D recursive erosion [25] Ui [x, y] = min{Ui [x − 1, y] + aη[x, y], Ui [x, y − 1] + aη[x, y], Ui [x − 1, y − 1] + bη[x, y], Ui [x + 1, y − 1] + bη[x, y], Ui−1 [x, y]} where U0 [x, y] is set equal to 0 if [x, y] belongs to the sources S or +∞ otherwise. The propagation of the local distance steps (a, b) in the 3 × 3 chamfer mask starts at the wave sources and moves at speeds proportional to 1/η[x, y]. The above recursive equation is run over the whole image in forward and backward order, iteratively (i = 1, 2, 3, ...) until stability. At convergence, U∞ is the GWDT of S. The above is a sequential implementation of the GWDT. There are also other faster implementations using queues [25, 14]. To improve the GWDT approximation to the eikonal’s solution, one can optimize (a, b). Using a larger (e.g., 5 × 5) mask can further reduce the approximation error but at the cost of an even slower implementation. However, if larger masks are used with GWDTs, they may give erroneous results since the large masks can bridge over a thin line that separates two segmentation regions.

1.2.2

GWDT based on Surface Evolution

In this approach, at time t = 0 the boundary of each source is modeled as a curve Γ(0) which is then propagated with normal speed c(x, y) ∝ 1/η(x, y). The propagating curve Γ(t) is embedded as the zero-level curve of a function Φ(x, y, t), where Φ(x, y, 0) = Φ0 (x, y) is the signed (positive in the curve interior) distance from Γ(0). The surface Φ evolves according to the PDE (4), which is solved via the numerical algorithm of [17]. The value of the resulting GWDT at any pixel (x, y) of the image is the time it takes for the evolving curve to reach this pixel, i.e. the smallest t such that Φ(x, y, t) ≥ 0. This continuous approach to GWDT can achieve sub-pixel accuracy, as investigated in [9]. To reduce the computational

complexity of the above surface evolution algorithm, we have developed a queue-based implementation of the fast marching level set methods of [23, 10] adapted to computing GWDTs in case of multiple sources where triple points develop at the collision of several wavefronts.

1.2.3

Gridless Halftoning via Eikonal PDE

Inspired by the use in [21] of the eikonal function’s contour lines for visually perceiving an intensity image I(x, y), the work in [25] and especially in [18] attempts to solve the PDE ||∇U (x, y)|| = const − I(x, y) and create a binary gridless halftone version of I(x, y) as the union of the level curves of the eikonal function U (x, y). The larger the intensity value I(x, y), the smaller the local density of these contour lines in the vicinity of (x, y). This eikonal PDE approach to gridless halftoning is indeed very promising and can simulate various artistic effects, as shown in Fig. 2. There we also see that the surface evolution GWDT gives a smoother halftoning of the image than the GWDTs based on chamfer metrics.

1.2.4

Watershed Segmentation via Eikonal

A powerful morphological approach to image segmentation is the watershed [15, 26] which transforms an image f (x, y) to the crest lines separating adjacent catchment basins that surround regional minima or other ‘marker’ sets of feature points. In [14, 16] it has been established that (in the continuous domain and assuming that the image is smooth and has isolated critical points) the continuous watershed is equivalent to ﬁnding a skeleton by inﬂuence zones with respect to a weighted distance function that uses the (one-point) regional minima of the image as sources and ||∇f || as the ﬁeld of indices. (If other markers diﬀerent than the minima are to be used as sources, then the homotopy of the function must be modiﬁed via morphological recosntruction to impose these markers as the only minima.) In our work we solve the above eikonal PDE model of watershed segmentation of an image-related function f by ﬁnding a GWDT via surface evolution (4) where the speed is ∝ 1/||∇f ||. Further we compare the results of this new segmentation to the digital watershed algorithm via ﬂooding [26] and to the eikonal approach solved via a discrete GWDT based on chamfer metrics [25, 14]. In all three approaches, robust features are extracted ﬁrst as markers of the regions, and the original image I is transformed to another function f by smoothing via alternat-

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

13

FRONTIERS IN MORPHOLOGICAL IMAGE ANALYSIS

(a) (b) (c) (d) Figure 2. Gridless halftoning of the Cameraman image from contour lines of GWDTs obtained via: (a) (1,1) chamfer metric; (b) optimal 3 × 3 chamfer metric; (c) optimal 5 × 5 chamfer metric; (d) curve evolution.

ing open/closing at multiple scales, taking the gradient magnitude of the ﬁltered image, and changing (via morphological reconstruction) the homotopy of the gradient image so that its only minima occur at the markers. The segmentation is done on the ﬁnal outcome f of the above processing. In the standard digital watershed algorithm [26, 15], the ﬂooding at each level is achieved by a planar distance propagation that uses the chess-board metric. This kind of distance propagation is non-isotropic and could give wrong results, particularly for images with large plateaus, as we found experimentally. Eikonal segmentation using chamfer-based GWDTs improves this situation a little but not entirely. In contrast, for images with large plateaus/regions, segmentation via the eikonal PDE and surface evolution GWDT gives results close to ideal. As Fig. 3 shows, compared on a test image that is diﬃcult (because expanding wavefronts meet watershed lines at many angles ranging from from being perpendicular to almost parallel), our continuous segmentation approach based on the eikonal PDE and surface evolution outperforms the discrete segmentation results (using either the digital watershed ﬂooding algorithm or chamfer-based GWDTs). However, some real images, as in Fig. 4, may not contain many plateaus or only large regions, in which cases the digital watershed ﬂooding algorithm may give comparable results (or slightly better for thin elongated regions) than the eikonal PDE approach. Of course, the fact that the eikonal PDE segmentation may not detect part or all of a thin elongated region could be an advantage in applications where such thin regions are noisy or unreliable and hence should not be detected by a robust segmentation scheme.

2

PART II: MRL NEURAL NETS

The perceptron, i.e., a linear combiner followed by a nonlinearity of the logistic type, is the standard node structure used in neural networks (NNs). However, it has been observed that logic operations, which are not well modeled by perceptrons, can be generated by some internal interactions in a neuron [33]. To better represent such internal properties, we propose the MRL-NNs, a general class of NNs where the combination of inputs in every node is formed by hybrid linear and nonlinear (of the

morphological/rank type) operations. The fundamental processing unit of this class of systems is the MRL-ﬁlter [29], which is a linear combination between a morphological/rank ﬁlter and a linear FIR ﬁlter. The MRLNNs have the unifying property that the characteristics of both multilayer perceptrons (MLPs) and morphological/rank neural networks (MRNNs) [30] are observed in the same system. An important special case of MRNNs is the class of min-max classiﬁers [34]. Next, we formulate a simple and systematic training procedure using ideas from the back-propagation algorithm [31] and robust techniques to circumvent the non-diﬀerentiability of rank functions. (Note that, since adaptive ﬁlters and NNs are closely related [28], we have investigated the adaptation of MRL ﬁlters and the training of MRL NNs under the same framework.) Our approach to train the morphological/rank nodes is a theoretically and numerically improved version of the method proposed in [32] to design morphological/rank ﬁlters. Finally, we apply the proposed design methodology in a problem of handwritten character recognition, and provide some experimental evidences showing that not only the MRL-NNs can generate similar or better results when compared with the classical MLPs, but also they usually require smaller processing times for training.

2.1

The Structure of MRL-NNs

In general terms, a (multilayer feed-forward) NN is a layered system composed by similar nodes, with some of them non-observable (hidden), where the node inputs in a given layer depend only on the node outputs from the preceding layer. Every node performs a generic composite operation, where an input to the node is ﬁrst processed by some function h(· , ·) of the input and internal weights, and then transformed by an activation function f (·). The node structure is deﬁned by the function h. In the case of MLPs, h is a linear combination. The activation function f is usually employed for rescaling purposes. A general NN is formally deﬁned by the following set of recursive equations. (l)

(l)

(l)

y (l) ≡ F (z (l) ) = (f (z1 ), f (z2 ), · · · , f (zNl )) , l = 1, 2, · · · , L , (l) zn ≡ h(y (l−1) , w(l) n ) , n = 1, 2, · · · , Nl ,

(6)

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

14

P. Maragos, A. Butt, L. Pessoa

(a) (b) (c) (d) Figure 3. The Test image is the minimum of two potential functions. Its contour plot (thin bright curves) is superimposed on all segmentation results. Markers are the two source points of the potential functions. Segmentation results based on: (a) Digital watershed ﬂooding algorithm. (b) (3×3) chamfer-based GWDT. (c) (5×5) chamfer-based GWDT. (d) Eikonal PDE and surface evolution GWDT. (The thick bright curve shows the correct segmentation.)

(a) (b) (c) (d) Figure 4. (a) Original Cameraman image with markers placed within desired regions. (b) Gradient magnitude of ﬁltered image. (c) Segmentation result (superimposed on original) from digital watershed ﬂooding algorithm. (d) Segmentation from eikonal PDE and surface evolution GWDT.

where l is the layer number, and Nl is the number of nodes in layer l. The weight vectors w(l) n represent the tuning parameters in the system. Besides this, the input and output of the system are y (0) = x = (x1 , x2 , · · · , xN0 ) y (L) = y = (y1 , y2 , · · · , yNL )

(input) (output)

(7)

The MRL-NN is the system deﬁned by (6) and (7) such that (l)

(l)

(l)

(l)

(l)

zn ≡ λn αn + (1 − λn )βn (l) αn = Rr(l) (y (l−1) + a(l) n ) (l) βn

(l)

n

=y

(l−1)

·

(b(l) n )

+

(8)

(l) τn

(l)

every node is then deﬁned by (l) (l) (l) (l) (l) w(l) n ≡ (an , ρn , bn , τn , λn ) ,

(9)

(l) ρn

instead of an integer rank where we use a real variable (l) variable rn because we will need to evaluate derivatives (l) during the design of MRL-NNs. The relation between ρn (l) and zn will be deﬁned later via a diﬀerential equation, (l) (l) and rn is obtained from ρn using 1

rn(l) ≡ Nl−1 −

Nl−1 − 1 (l)

1 + exp(−ρn )

+ 0.5

.

(10)

Two important special cases of MRL-NNs are obtained when f is the identity, called MRL-NN of type I (l) (e.g., MRNN: λn = 1 ∀n, l), and when f is a nonlinearity of the logistic type, called MRL-NN of type II (e.g., (l) MLP: λn = 0 ∀n, l).

(l) Nl−1 ; and ‘ ’ denotes where λn , τn ∈ IR; a(l) n , bn ∈ IR transposition. Rr (t) is the r-th rank function of the vector t ∈ IRn . It is evaluated by sorting the components of t = (t1 , t2 , · · · , tn )2.2 Adaptive Design in decreasing order, t(1) ≥ t(2) ≥ · · · ≥ t(n) , and picking Based on the LMS criterion and using ideas from the the r-th element of the sorted list, i.e., Rr (t) = t(r) , back-propagation algorithm, we propose a steepest der = 1, 2, · · · , n. scent method to optimally design general NNs, and then Observe from (6) and (8) that the underlying funcapply it to MRL-NNs. The design goal is to achieve a set tion h is an MRL-ﬁlter [29] shifted by a threshold (1 − of parameters w(l) n , n = 1, 2, · · · , Nl , l = 1, 2, · · · , L, such (l) (l) (l) (l) λn )τn . The variables τn are important when λn = 0. that some cost function is minimized using a supervised For every MRL-ﬁlter, the vector b(l) procedure. Consider the training set n corresponds to the coeﬃcients of a linear FIR ﬁlter, and the vector a(l) n repre{(x(k), d(k)) , k = 0, 1, · · · , K − 1} , (11) sents the coeﬃcients of a morphological/rank ﬁlter. The (l)

(l)

variables rn and λn are the rank and the mixing parameters, respectively. The resulting weight vector for

1 · denotes the usual truncation operation, so that · + 0.5 is the usual rounding operation.

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

15

FRONTIERS IN MORPHOLOGICAL IMAGE ANALYSIS where d(k) is the desired system output to the training sample x(k). From (11) we generate the training sequence 2 (x([k] mod K ), d([k] mod K )) , k ∈ ZZ ,

(12) where

by making a periodic extension of (11). Every period of (12) is usually called an epoch. A general supervised training algorithm is of the form w(l) n (i

w(l) n (i)

v (l) n (i)

+ 1) = +µ , µ>0, n = 1, 2, · · · , Nl ; l = 1, 2, · · · , L ,

In this way, using the chain rule to evaluate U (l) (k), it can be shown that

(13)

1 M

i

ξ(k) , 1 ≤ M ≤ K ,

ξ(k) ≡

NL

(22)

  

Γ(l) = 

i

(15)

∂ξ(k) (l)

∂wn

.

  

W (l) ≡ 

   

V (l) ≡ 



(l)

v1 (l) v2 .. . (l) v Nl

(l) w1 (l) w2

.. .

(l)

w Nl

     , U (l) ≡   

(l)

u1 (l) u2 .. . (l) u Nl

    , 

(l)

(25)

(l)

(l)

1 Nl−1

∂αn = s(l) n ≡ ∂ρ(l) n

(26)

(l) Q(αn 1 − y (l−1) − a(l) n )·1 .

In (25) and (26), Q(v) ≡ (q(v1 ), q(v2 ), · · · , q(vn )), where q(v) ≡

(19)

(l)

(l)





(24)

Q(αn 1 − y (l−1) − an ) · 1

   , 

   , 

∂αn ∂αn = = c(l) n ≡ (l) ∂y (l−1) ∂an (l) (l−1) − a(l) Q(αn 1 − y n )

If we deﬁne the matrices W (l) , V (l) and U (l) by



(l)

(17)

(18)



(l)

θ1 (l) θ2 .. . (l) θ Nl

Based on this framework, the design of MRL-NNs can easily be derived. The diﬃculty is due to the nondiﬀerentiability of rank functions, but we can circumvent this problem by using pulse functions as follows [29].

(16)

u(l) n (k) ,

(23)

∂zn ∂zn , γ (l) = . (l) n ∂y (l−1) ∂wn

θ(l) n =

1− u(l) n (k) = −



    (l)  , Θ =  

where

k=i−M +1

where



l

(14)

n=1

1 M

γ (l) 1 γ (l) 2 .. . γ (l) N

(l)

e2n (k) .

, l=L , otherwise

The matrices Γ(l) and Θ(l) are deﬁned by

Based on the steepest descent algorithm, it follows from (13) and (15) that v (l) n (i) =

e(k) δ (l+1) (k) · Θ(l+1) (k)



k=i−M +1

where

δ (l) (k) = e(l) (k) F˙ (z (l) (k)) ,

3

e(l) (k) =

and the cost function J(i) ≡

(21)

where the positive constant µ controls the tradeoﬀ between stability and speed of convergence, v (l) n = −∇J, and J is some cost function to be minimized. Let us deﬁne the error signal e(k) = (e1 (k), e2 (k), · · · , eNL (k)) ≡ d([k] mod K ) − y(k) ,

U (l) (k) = 2 diag(δ (l) (k)) · Γ(l) (k) ,

1 0

, if v = 0 , if v ∈ IR \ {0}

(27)

and 1 = (1, 1, · · · , 1). To avoid abrupt changes and achieve numerical robustness, we frequently replace the function q(v) by smoothed impulses qσ (v), σ ≥ 0, such as exp[− 12 (v/σ)2 ] or sech2 (v/σ). The remaining unknown is F˙ (·), that depends on the type of the MRL-NN in use. For the MRL-NN of type I, F (z (l) ) = z (l) , so that F˙ (z (l) ) = 1. For the MRLNN of type II, we will use f (z) = [1 + exp(−η z)]−1 , η ≥ 1, whose derivative is f˙(z) = η f (z)[1 − f (z)], so that F˙ (z (l) ) = ηy (l) [1 − y (l) ].

then the algorithm (13) can be written as

2.3

W (l) (i + 1) = W (l) (i) + µ V (l) (i) , V (l) (i) =

1 M

i

U (l) (k) ,

(20)

k=i−M +1

l = 1, 2, · · · , L 2 [k]

mod K

≡ k − K k/K denotes the index k modulo K.

Application in OCR

Using the design framework discussed in the previous section, we now describe some experimental results in a problem of optical character recognition (OCR). Our 3 We denote F˙ (z) ≡ (f˙(z ), f˙(z ), · · · , f˙(z )). The symbol n 1 2 ‘’ denotes an array (element-by-element) multiplication.

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

16

Training Testing Epoch

P. Maragos, A. Butt, L. Pessoa

MRL5 11.8/13.2/10.5 18.7/22.4/15.0 3

MLP5 9.9/8.8/11.1 18.4/19.9/16.9 10

FM / E(θ) / R(θ) MRL10 MLP10 7.4/6.9/7.8 7.4/7.0/7.7 11.0/13.1/8.9 11.1/10.9/11.4 62 96

MRL20 8.4/7.8/9.0 17.4/24.6/10.2 9

MLP20 7.5/6.8/8.2 11.8/12.8/10.9 88

Table 1: Figure of merit / mean error rate / mean rejection rate corresponding to the optimal set of weights of best MRL-NNs vs. best MLPs for µ = 0.05.

approach is to perform a comparative analysis of MRLNNs versus MLPs, illustrating some of the characteristics of both systems. We show that the MRL-NNs are a good alternative to MLPs, usually providing equal or better performance with smaller training times. To do so, we used a large database of handwritten characters provided by the National Institute of Standards and Technology (NIST) [27]. We selected a total of K = 61, 094 samples of handwritten digits to form our data set. In our simulations, we normalized the feature vectors (64 dimensional Karhunen-Lo`eve transforms) so that each x(k) ∈ [0, 1]. The data set was split such that 45, 000 digits were used for training and the remaining 16, 094 digits were used for testing. The ﬁrst 15, 000 elements of the training set were used as a validation set during the training process. The training sequence was ordered such that one instance of every digit is presented to the system in each iteration. After making several tests, we have set a group 12 experiments with 3 diﬀerent network topologies: 64-N10 MRL-NNs and MLPs, N = 5, 10, 20. This notation indicates a system with 64 inputs, N hidden nodes, and 10 outputs. Two diﬀerent step sizes were tested: µ = 0.005, 0.05. Every experiment was repeated 5 times with diﬀerent random initial conditions, and the best result is reported here. Among many possible ways to initialize the systems, and after performing various tests, we initialized the weights randomly in the ranges: a(l) : √ √ n (l) (l) [−0.1, 0.1], rn : [1, Nl−1 ], bn : [−1/ Nl−1 , 1/ Nl−1 ], (l) (l) τn : [−0.1, 0.1], λn : [0.4, 0.6]. Further, in order to estimate gradients, we smoothed impulses with qσ (v) = exp[− 12 (v/σ)2 ], σ = 0.05. Due to the size of the training set, we have used the proposed training algorithm with M = 1 only. We have tested the case M > 1 with a small subset of the training set, but no signaniﬁcant improvements were observed. Both MRL-NNs and MLPs were deﬁned using a sigmoid activation function with η = 1 (MRL-NNs of type II). As usual, the desired system output d = (d0 , d1 , · · · , d9 ) was deﬁned by

dn =

1 0

, x ↔ digit n , otherwise

(28)

In the attempt to compare diﬀerent systems, a ﬁgure of merit (FM) was deﬁned as follows 1 F M (t) = ( E(θ) + R(θ) ) 2

(29)

where θ ∈ [0, 1] is the conﬁdence threshold; t is the epoch; E is the error rate (%), computed, for a given θ, as the ratio of the number of misclassiﬁed digits over the number of digits that were not rejected during the classiﬁcation (in a percentage basis); R is the rejection rate (%), computed, for a given θ, as the ratio of the number of rejected digits over the total number of elements in the set under consideration (also in a percentage basis); and

E(θ) =

9 9 1 i i 1 E( ) , R(θ) = R( ) 10 10 10 10 i=0

i=0

The training process tends to decrease the ﬁgure of merit, and good performance corresponds to small values of FM. A given classiﬁcation is rejected if the desired n-th output is dn = 1 but the actual n-th output has the property max{y} = yn < θ, where y is the system output. An error (misclassiﬁcation) is obtained when dn = 1 but max{y} = yn . The error rate is computed excluding the rejected digits. Using our proposed training algorithm with all the above considerations, we observed that, either for a step size µ = 0.005 or µ = 0.05, the MRL-NNs required a smaller number of iterations than MLPs, and provided similar performances (FMs). Computing the ﬁgures of merit of MLPs with equal number of iterations of MRLNNs, we usually observed better performances of MRLNNs. Table 1 summarizes some of the results. The best training performance was obtained with a 64-10-10 MRLNN (MRL10, FM=7.4%). Similar results were obtained with a 64-10-10 MLP (MLP10, FM=7.4%) and a 64-2010 MLP (MLP20, FM=7.5%), but with a larger number of iterations. These best results were all obtained with µ = 0.05.

References References for Part I [1] L. Alvarez, F. Guichard, P.L. Lions, and J.M. Morel, “Axioms and Fundamental Equations of Image Processing”, Archiv. Rat. Mech., vol. 123 (3), pp. 199– 257, 1993. Also, C. R. Acad. Sci. Paris, pp. 265-268, t.315, Serie I, 1992. [2] A. Arehart, L. Vincent and B. Kimia, “Mathematical Morphology: The Hamilton-Jacobi Connection”, in Proc. ICCV-93, pp. 215–219, 1993.

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

17

FRONTIERS IN MORPHOLOGICAL IMAGE ANALYSIS [3] G. Borgefors, “Distance Transformations in Digital Images”, Comp. Vision, Graphics, Image Process., 34, pp. 344–371, 1986. [4] M. Born and E. Wolf, Principles of Optics, Pergamon Press, Oxford, England, 1959 (1987 edition). [5] R. Brockett and P. Maragos, “Evolution Equations for Continuous-Scale Morphological Filtering”, IEEE Trans. Signal Processing, vol. 42, pp. 33773386, Dec. 1994. Also, in Proc. ICASSP-92, San Francisco, 1992. [6] M. A. Butt and P. Maragos. “Optimal Design of Chamfer Distance Transforms”, IEEE Trans. Image Processing, in press. [7] B.K.P. Horn, Robot Vision, MIT Press, Cambridge, MA, 1986. [8] B. Kimia, A. Tannenbaum, and S. Zucker, “Toward a Computational Theory of Shape: An Overview”, Proc. ECCV-90, France, April 1990. [9] R. Kimmel, N. Kiryati, and A. M. Bruckstein, “SubPixel Distance Maps and Weighted Distance Transforms”, J. Math. Imaging and Vision, 6, pp. 223233, 1996. [10] R. Malladi, J. A. Sethian, and B. C. Vemuri, “A Fast Level Set Based Algorithm for TopologyIndependent Shape Modeling”, J. Math. Imaging and Vision, 6, pp. 269-289, 1996. [11] P. Maragos, “Diﬀerential Morphology and Image Processing” IEEE Trans. Image Processing, vol. 78, pp. 922–937, June 1996. [12] P. Maragos and R. W. Schafer, “Morphological Systems for Multidimensional Signal Processing”, Proc. IEEE, vol. 78, pp. 690-710, Apr. 1990. [13] J. Mattioli, “Diﬀerential Relations of Morphological Operators”, Proc. 1st Int’l Workshop on Math. Morphology and its Application to Signal Processing, Barcelona, Spain, May 1993. [14] F. Meyer, “Topographic Distance and Watershed Lines”, Signal Processing, vol. 38, pp. 113-125, July 1994. [15] F. Meyer and S. Beucher, “Morphological Segmentation”, J. Visual Commun. Image Representation, 1(1):21–45, 1990. [16] L. Najman and M. Schmitt, “Watershed of a Continuous Function”, Signal Processing, vol. 38, pp. 99112, July 1994. [17] S. Osher and J. Sethian, “Fronts Propagating with Curvature-Dependent Speed: Algorithms Based on Hamilton-Jacobi Formulations”, J. Comput. Physics, 79, pp. 12–49, 1988. [18] Y. Pnueli and A. M. Bruckstein, “DigiD u ¨rer - a digital engraving system”, The Visual Computer, 10, pp. 277–292, 1994. [19] E. Rouy and A. Tourin, “A Viscocity Solutions Approach to Shape from Shading”, SIAM J. Numer. Anal., vol. 29 (3), pp. 867-884, June 1992.

[20] G. Sapiro, R. Kimmel, D. Shaked, B. Kimia, and A. Bruckstein, “Implementing Continuous-scale Morphology via Curve Evolution”, Pattern Recognition, 26(9), pp. 1363–1372, 1993. [21] M. Schr¨ oder, “The Eikonal Equation”, Math. Intelligencer, 1, pp. 36–37, 1983. [22] J. Serra, Image Analysis and Mathematical Morphology, Acad. Press, NY, 1982. [23] J. A. Sethian, Level Set Methods, Cambridge Univ. Press, 1996. [24] R. van der Boomgaard and A. Smeulders, “The Morphological Structure of Images: The Diﬀerential Equations of Morphological Scale-Space”, IEEE Trans. Pattern Anal. Mach. Intellig., vol. 16, pp.1101-1113, Nov. 1994. Also, Ph.D. Thesis, Univ. of Amsterdam, 1992. [25] P. Verbeek and B. Verwer, “Shading from shape, the eikonal equation solved by grey-weighted distance transform”, Pattern Recogn. Lett., 11:618–690, 1990. [26] L. Vincent and P. Soille, “Watershed In Digital Spaces: An Eﬃcient Algorithm Based On Immersion Simulations”, IEEE Trans. PAMI, vol. 13, pp. 583–598, June 1991. References for Part II [27] M.D. Garris, J.L. Blue, G.T. Candela, D.L. Dimmick, J. Geist, P.J. Grother, S.A. Janet, and C.L. Wilson, “NIST form-based handprint recognition system”, Tech. Rep. NISTIR 5469, Nat’l Inst. Standards & Technology, July 1994. [28] S. Marcos, O. Macchi, C. Vignat, G. Dreyfus, L. Personnaz, and P. Roussel-Ragot, “A uniﬁed framework for gradient algorithms used for ﬁlter adaptation and neural network training”, Int’l J. Circuit Theory & Applications, 20:159–200, 1992. [29] L. F.C. Pessoa and P. Maragos, “MRL-Filters: A general class of nonlinear systems and their optimal design for image processing,” IEEE Trans. Image Processing, July 1998. [30] L. F.C. Pessoa and P. Maragos, “Morphological/rank neural networks and their adaptive optimal design for image processing”, Proc. ICASSP-96, vol.6, pp.3399–3402, May 1996. [31] D.E. Rumelhart, G.E. Hinton, and R.J. Willians, “ Learning Internal Representations by Error Propagation,” Parallel Distributed Processing, Vol.1, D.E. Rumelhart and J.L. McClelland, eds., pp.318–362, MIT Press, Cambridge, MA, 1986. [32] P. Salembier, “Adaptive rank order based ﬁlters,” Signal Processing, 27:1–25, Apr. 1992. [33] G.M. Shepherd and R.K. Brayton, “Logic operations are properties of computer-simulated interactions between excitable dendritic spines,” Neuroscience, 21(1):151–165, 1987. [34] P.-F. Yang and P. Maragos, “Min-max classiﬁers: Learnability, design and application,” Pattern Recognition, 28(6):879–899, 1995.

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

Recommend Documents

Differential Morphology - CVSP - NTUA

Speech Analysis and Feature Extraction using Chaotic ... - CVSP - NTUA

PDE-Based Modeling of Image Segmentation Using ... - CVSP - NTUA