two frontiers in morphological image analysis - CVSP - NTUA

Report 3 Downloads 76 Views
TWO FRONTIERS IN MORPHOLOGICAL IMAGE ANALYSIS: DIFFERENTIAL EVOLUTION MODELS AND HYBRID MORPHOLOGICAL/LINEAR NEURAL NETWORKS ∗ ´cio F. C. Pessoa3 Petros Maragos1&2 , M. Akmal Butt1 , and Lu 1

School of E.C.E., Georgia Institute of Technology, Atlanta, GA 30332, USA. [maragos,akmal,lpessoa]@ee.gatech.edu 2 Institute for Language and Speech Processing, Artemidos/Epidavrou Str., Marousi 15125, Greece. 3 Motorola Semiconductor Products Sector, Austin, TX 78721, USA. Abstract. In this paper we briefly describe advancements in two broad areas of morphological image analysis. Part I deals with differential morphology and curve evolution. The partial differential equations (PDEs) that model basic morphological operations are first presented. The resulting dilation PDE, numerically implemented by curve evolution algorithms, improves the accuracy of morphological multiscale analysis by Euclidean disks and (its anisotropic/heterogeneous version) is the basic ingredient of PDE models that solve image analysis problems such as gridless halftoning and watershed segmentation based on the eikonal PDE. Part II deals with morphology-related systems for pattern recognition. It presents a general class of multilayer feed-forward neural networks where the combination of inputs in every node is formed by hybrid linear and nonlinear (of the morphological/rank type) operations. For its design a methodology is formulated using ideas from the back-propagation algorithm and robust techniques are developed to circumvent the non-differentiability of rank functions. Experimental results in handwritten character recognition are described and illustrate some of the properties of this new type of neural nets. Keywords: differential morphology, PDEs, curve evolution, neural nets, character recognition.

1

PART I: DIFFERENTIAL MORPHOLOGY PDEs have the form AND CURVE EVOLUTION ∂Ψ/∂s = ±||∇Ψ||

Morphological image processing has been based traditionally on set and lattice theory. Thus, so far, the two classic approaches to analyze or design deterministic morphological operators have been: (i) geometry by viewing them as image set transformations in Euclidean spaces and (ii) algebra to analyze their properties using set or lattice theory. In parallel to these directions, there is a recently growing part of morphological image processing that uses tools from differential calculus and dynamical systems to model nonlinear multiscale analysis and distance propagation in images. Recently, the multiscale morphological operators of dilation, erosion [1, 5, 24] and opening, closing [5] were modeled via nonlinear partial differential equations (PDEs) acting in scale-space. These advancements were inspired by previous work in computer vision where multiscale linear convolutions of an image were modeled via the heat PDE. For multiscale flat dilations and erosions of an image f (x, y) by compact convex symmetric structuring sets B⊆ IR2 at a continuum of scales s ≥ 0, their generating ∗ This work was done mostly at Georgia Tech and was supported by the US NSF under grant MIP–94-21677 and by the ARO under grant DAAH04-96-1-0161. L. Pessoa was also supported by CNPq (Conselho Nacional de Desenvolvimento Cient´ıfico e Tecnol´ ogico), Bras´ılia, Brazil, through a Doctoral Fellowship under grant 200.846/92-2. The paper was written while P. Maragos was at I.L.S.P.

B

,

Ψ(x, y, 0) = f (x, y)

(1)

where Ψ(x, y, s) is the dilation ⊕ or erosion  of f by sB, +/− corresponds to dilation/erosion respectively, ||(x, y)||B ≡ sup(a,b)∈B (ax + by), ∇Ψ ≡ (Ψx , Ψy ) is the spatial gradient, and Ψx ≡ ∂Ψ/∂x denote partial derivatives. For example, if B is the unit disk, || · ||B is the Euclidean norm || · || and ∂Ψ/∂s = ±||∇Ψ|| = ±



(Ψx )2 + (Ψy )2

(2)

Even if the initial image f is smooth, at finite scales s > 0 the above dilation or erosion evolution may create discontinuities in the derivatives, like ‘shocks’. Thus, the dilations f ⊕sB or erosions f sB are weak solutions of (1). Ways to deal with these shocks include replacing standard derivatives with morphological derivatives [5, 11] or replacing the PDEs with differential inclusions [13]. In parallel to the development of the above ideas, there have been some advances in the field of differential geometry for evolving curves or surfaces using level set methods [17, 23]. Consider at time t = 0 an initial simple, smooth, closed planar curve Γ(0) which is propagated for t > 0 along its normal vector field with speed c. Let this evolving curve Γ(t) be represented by its position vector  X(p, t) = (x(p, t), y(p, t)) and parameterized by p ∈ J so that it has its interior on the left in the direction of increasing p. A general propagation law is  ∂ X(p, t)  (p, t) , =cN ∂t

 Γ(0) = {X(p, 0) : p ∈ J}, (3)

Invited Paper. Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Brazil, Oct. 1998.

11

FRONTIERS IN MORPHOLOGICAL IMAGE ANALYSIS  (p, t) is the instantaneous unit outward normal where N vector at points on the evolving curve, and c = c(x, y, t) is the speed function which generally depends on local geometrical information such as the curvature κ(p, t), global image properties, or other factors independent of the curve. If c = 1 or c = −1, then Γ(t) is the dilation or erosion of the initial curve Γ(0) by a disk of radius t. The speed model c = 1±εκ, |ε| ≤ 1, has been extensively studied in [17, 23] for general evolution of boundaries and interfaces and in [8] for shape analysis in computer vision. To overcome the topological problem of splitting and merging and numerical problems with the Lagrangian formulation (3), an Eulerian formulation was proposed in [17] where the original curve Γ(0) is first embedded in the surface of an arbitrary 2D Lipschitz continuous function Φ0 (x, y) as its zero-level curve. (For example, we select Φ0 (x, y) to be equal to the signed (±) distance function from the boundary of Γ(0) where + is for points inside and − is for points outside the curve.) Then, the evolving 2D curve is obtained as the zero-level curve Γ(t) = {(x, y) : Φ(x, y, t) = 0} of a 2D function Φ(x, y, t) that evolves according to the PDE ∂Φ/∂t = c||∇Φ|| ,

Φ(x, y, 0) = Φ0 (x, y)

(4)

This function evolution PDE makes all level sets expand at normal speed c. If c = ±1, it is identical to the flat circular dilation/erosion PDE (2) by equating scale s with time t.

1.1

Multiscale Analysis via Dilation PDE

Many applications of mathematical morphology [22, 12] such as nonlinear smoothing, geometrical feature extraction, skeletonization, size distributions, and segmentation, inherently require or can benefit from performing morphological image operations at multiple scales, which creates a morphological scale-space. For binary images, the distance transform is a compact way to represent their multiscale dilations and erosions by convex structuring elements whose shape depends upon the norm used to measure distances. Thresholding the distance transform at level t > 0 yields the morphological erosion of the image foreground (or the dilation of the background) by the norm-induced ball of radius (scale) t. To obtain isotropic distance propagation, the Euclidean distance transform is desirable because it gives multiscale morphology with the disk as the structuring element. However, it has a significant computational complexity. Discrete approaches use various techniques to obtain integer approximations to the Euclidean distance transform at a lower complexity. Notable such examples are the chamfer metrics [3], computed by running recursive min-sum difference equations over the image and thus propagating local distances within a neighborhood mask. Their associated unit ball is a polygon whose approximation of the disk improves by increasing the size of the mask and optimizing the local distances. In this paper, for optimal chamfer transforms we shall use the local distances found in [6].

The continuous approach uses the dilation PDE (2). This applies to both graylevel and binary images, because flat dilations commute with thresholding and hence, when a graylevel image is dilated, each one of its thresholded versions representing a binary image is simultaneously dilated by the same element and at the same scale. Thus, the dilation PDE plays a fundamental role for modeling morphological scale-space. However, its usefulness is greatly amplified by the existence of a stable and shock-capturing algorithm [17] for the numerical implementation of (4). Its main steps are : • Let Φn i,j be an estimate of Φ(i∆x, j∆y, n∆t) on a grid. n − n n • Dx+ = (Φn i+1,j − Φi,j )/∆x , Dx = (Φi,j − Φi−1,j )/∆x + n n − n • Dy = (Φi,j+1 − Φi,j )/∆y , Dy = (Φi,j − Φn i,j−1 )/∆y • H 2 = min2 (0, Dx− ) + max2 (0, Dx+ ) + min2 (0, Dy− ) + max2 (0, Dy+ ) n • Φn+1 = Φn , n = 1, 2, ..., (Tmax /∆t) i,j + Ci,j |H|∆t i,j where Tmax is the maximum time (scale) of interest, ∆x, ∆y are the spatial grid spacings, ∆t is the time (scale) n step, and Ci,j = c(i∆x, j∆y, n∆t). Thus, by choosing fine grids (and possibly higher order terms) an arbitrarily low error (between signal values on the continuous plane and the discrete grid) can be achieved in implementing morphological operations involving disks as structuring elements. This is a significant advantage of the PDE approach, as observed in [2, 20]. Thus, curve evolution provides a geometrically better implementation of multiscale morphological operations with the disk-shaped structuring element. See Fig. 1 for an example.

1.2

Eikonal PDE

Many tasks for extracting information from visible images have been related to eikonal optics and wave propagation via the eikonal PDE [4] ||∇U (x, y)|| = η(x, y)

(5)

Its solution U (x, y) can provide 3D shape, contour halftoning, or topographic segmentation of an image f (x, y) by choosing the refractive index field η(x, y) to be an appropriate function of the image [7, 25, 18, 16]. The eikonal PDE can be seen as a stationary formulation of the function evolution PDE (4) with positive speed c(x, y) = c0 /η(x, y). Namely [17], if T (x, y) is the time at which the zero level of Φ(x, y, t) crosses (x, y), then ||∇T || = 1/c. Setting U = c0 T leads to the eikonal. The solution of the eikonal PDE can be viewed as a weighted distance function [4, 25, 19, 9, 11] between a point (x, y) and the sources along a path of minimal optical length. The optical length of any path is obtained by integrating the refractive index field η(x, y) along this path and is proportional to the time required for light to travel this path. Thus, we can view the solution U (x, y) to the eikonal as a gray-weighted distance transform (GWDT) whose values at each pixel give the minimum distance from the light sources weighted by the gray values of the refractive index field. Next we outline two ways of solving the eikonal PDE.

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

12

P. Maragos, A. Butt, L. Pessoa

(a) (b) (c) (d) Figure 1. Distance transforms (modulo a constant) of a binary image, obtained via: (a) (1,1) chamfer metric; (b) optimal 3 × 3 chamfer metric; (c) optimal 5 × 5 chamfer metric; (d) curve evolution.

1.2.1

GWDT based on Chamfer Metrics

Consider a sampled refractive index field η[x, y] (viewed as a positive image) and a set S of sources. A discrete GWDT finds at each pixel p = [x, y] the smallest sum of values of η over all possible paths connecting p to the sources S. This can also be viewed as a procedure of finding paths of minimal ‘cost’ among nodes of a weighted graph or as discrete dynamic programming. Such a discrete GWDT (which is an approximation [25, 11] to the solution of the eikonal PDE ||∇U || = η) can be obtained via the following 2D recursive erosion [25] Ui [x, y] = min{Ui [x − 1, y] + aη[x, y], Ui [x, y − 1] + aη[x, y], Ui [x − 1, y − 1] + bη[x, y], Ui [x + 1, y − 1] + bη[x, y], Ui−1 [x, y]} where U0 [x, y] is set equal to 0 if [x, y] belongs to the sources S or +∞ otherwise. The propagation of the local distance steps (a, b) in the 3 × 3 chamfer mask starts at the wave sources and moves at speeds proportional to 1/η[x, y]. The above recursive equation is run over the whole image in forward and backward order, iteratively (i = 1, 2, 3, ...) until stability. At convergence, U∞ is the GWDT of S. The above is a sequential implementation of the GWDT. There are also other faster implementations using queues [25, 14]. To improve the GWDT approximation to the eikonal’s solution, one can optimize (a, b). Using a larger (e.g., 5 × 5) mask can further reduce the approximation error but at the cost of an even slower implementation. However, if larger masks are used with GWDTs, they may give erroneous results since the large masks can bridge over a thin line that separates two segmentation regions.

1.2.2

GWDT based on Surface Evolution

In this approach, at time t = 0 the boundary of each source is modeled as a curve Γ(0) which is then propagated with normal speed c(x, y) ∝ 1/η(x, y). The propagating curve Γ(t) is embedded as the zero-level curve of a function Φ(x, y, t), where Φ(x, y, 0) = Φ0 (x, y) is the signed (positive in the curve interior) distance from Γ(0). The surface Φ evolves according to the PDE (4), which is solved via the numerical algorithm of [17]. The value of the resulting GWDT at any pixel (x, y) of the image is the time it takes for the evolving curve to reach this pixel, i.e. the smallest t such that Φ(x, y, t) ≥ 0. This continuous approach to GWDT can achieve sub-pixel accuracy, as investigated in [9]. To reduce the computational

complexity of the above surface evolution algorithm, we have developed a queue-based implementation of the fast marching level set methods of [23, 10] adapted to computing GWDTs in case of multiple sources where triple points develop at the collision of several wavefronts.

1.2.3

Gridless Halftoning via Eikonal PDE

Inspired by the use in [21] of the eikonal function’s contour lines for visually perceiving an intensity image I(x, y), the work in [25] and especially in [18] attempts to solve the PDE ||∇U (x, y)|| = const − I(x, y) and create a binary gridless halftone version of I(x, y) as the union of the level curves of the eikonal function U (x, y). The larger the intensity value I(x, y), the smaller the local density of these contour lines in the vicinity of (x, y). This eikonal PDE approach to gridless halftoning is indeed very promising and can simulate various artistic effects, as shown in Fig. 2. There we also see that the surface evolution GWDT gives a smoother halftoning of the image than the GWDTs based on chamfer metrics.

1.2.4

Watershed Segmentation via Eikonal

A powerful morphological approach to image segmentation is the watershed [15, 26] which transforms an image f (x, y) to the crest lines separating adjacent catchment basins that surround regional minima or other ‘marker’ sets of feature points. In [14, 16] it has been established that (in the continuous domain and assuming that the image is smooth and has isolated critical points) the continuous watershed is equivalent to finding a skeleton by influence zones with respect to a weighted distance function that uses the (one-point) regional minima of the image as sources and ||∇f || as the field of indices. (If other markers different than the minima are to be used as sources, then the homotopy of the function must be modified via morphological recosntruction to impose these markers as the only minima.) In our work we solve the above eikonal PDE model of watershed segmentation of an image-related function f by finding a GWDT via surface evolution (4) where the speed is ∝ 1/||∇f ||. Further we compare the results of this new segmentation to the digital watershed algorithm via flooding [26] and to the eikonal approach solved via a discrete GWDT based on chamfer metrics [25, 14]. In all three approaches, robust features are extracted first as markers of the regions, and the original image I is transformed to another function f by smoothing via alternat-

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

13

FRONTIERS IN MORPHOLOGICAL IMAGE ANALYSIS

(a) (b) (c) (d) Figure 2. Gridless halftoning of the Cameraman image from contour lines of GWDTs obtained via: (a) (1,1) chamfer metric; (b) optimal 3 × 3 chamfer metric; (c) optimal 5 × 5 chamfer metric; (d) curve evolution.

ing open/closing at multiple scales, taking the gradient magnitude of the filtered image, and changing (via morphological reconstruction) the homotopy of the gradient image so that its only minima occur at the markers. The segmentation is done on the final outcome f of the above processing. In the standard digital watershed algorithm [26, 15], the flooding at each level is achieved by a planar distance propagation that uses the chess-board metric. This kind of distance propagation is non-isotropic and could give wrong results, particularly for images with large plateaus, as we found experimentally. Eikonal segmentation using chamfer-based GWDTs improves this situation a little but not entirely. In contrast, for images with large plateaus/regions, segmentation via the eikonal PDE and surface evolution GWDT gives results close to ideal. As Fig. 3 shows, compared on a test image that is difficult (because expanding wavefronts meet watershed lines at many angles ranging from from being perpendicular to almost parallel), our continuous segmentation approach based on the eikonal PDE and surface evolution outperforms the discrete segmentation results (using either the digital watershed flooding algorithm or chamfer-based GWDTs). However, some real images, as in Fig. 4, may not contain many plateaus or only large regions, in which cases the digital watershed flooding algorithm may give comparable results (or slightly better for thin elongated regions) than the eikonal PDE approach. Of course, the fact that the eikonal PDE segmentation may not detect part or all of a thin elongated region could be an advantage in applications where such thin regions are noisy or unreliable and hence should not be detected by a robust segmentation scheme.

2

PART II: MRL NEURAL NETS

The perceptron, i.e., a linear combiner followed by a nonlinearity of the logistic type, is the standard node structure used in neural networks (NNs). However, it has been observed that logic operations, which are not well modeled by perceptrons, can be generated by some internal interactions in a neuron [33]. To better represent such internal properties, we propose the MRL-NNs, a general class of NNs where the combination of inputs in every node is formed by hybrid linear and nonlinear (of the

morphological/rank type) operations. The fundamental processing unit of this class of systems is the MRL-filter [29], which is a linear combination between a morphological/rank filter and a linear FIR filter. The MRLNNs have the unifying property that the characteristics of both multilayer perceptrons (MLPs) and morphological/rank neural networks (MRNNs) [30] are observed in the same system. An important special case of MRNNs is the class of min-max classifiers [34]. Next, we formulate a simple and systematic training procedure using ideas from the back-propagation algorithm [31] and robust techniques to circumvent the non-differentiability of rank functions. (Note that, since adaptive filters and NNs are closely related [28], we have investigated the adaptation of MRL filters and the training of MRL NNs under the same framework.) Our approach to train the morphological/rank nodes is a theoretically and numerically improved version of the method proposed in [32] to design morphological/rank filters. Finally, we apply the proposed design methodology in a problem of handwritten character recognition, and provide some experimental evidences showing that not only the MRL-NNs can generate similar or better results when compared with the classical MLPs, but also they usually require smaller processing times for training.

2.1

The Structure of MRL-NNs

In general terms, a (multilayer feed-forward) NN is a layered system composed by similar nodes, with some of them non-observable (hidden), where the node inputs in a given layer depend only on the node outputs from the preceding layer. Every node performs a generic composite operation, where an input to the node is first processed by some function h(· , ·) of the input and internal weights, and then transformed by an activation function f (·). The node structure is defined by the function h. In the case of MLPs, h is a linear combination. The activation function f is usually employed for rescaling purposes. A general NN is formally defined by the following set of recursive equations. (l)

(l)

(l)

y (l) ≡ F (z (l) ) = (f (z1 ), f (z2 ), · · · , f (zNl )) , l = 1, 2, · · · , L , (l) zn ≡ h(y (l−1) , w(l) n ) , n = 1, 2, · · · , Nl ,

(6)

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

14

P. Maragos, A. Butt, L. Pessoa

(a) (b) (c) (d) Figure 3. The Test image is the minimum of two potential functions. Its contour plot (thin bright curves) is superimposed on all segmentation results. Markers are the two source points of the potential functions. Segmentation results based on: (a) Digital watershed flooding algorithm. (b) (3×3) chamfer-based GWDT. (c) (5×5) chamfer-based GWDT. (d) Eikonal PDE and surface evolution GWDT. (The thick bright curve shows the correct segmentation.)

(a) (b) (c) (d) Figure 4. (a) Original Cameraman image with markers placed within desired regions. (b) Gradient magnitude of filtered image. (c) Segmentation result (superimposed on original) from digital watershed flooding algorithm. (d) Segmentation from eikonal PDE and surface evolution GWDT.

where l is the layer number, and Nl is the number of nodes in layer l. The weight vectors w(l) n represent the tuning parameters in the system. Besides this, the input and output of the system are y (0) = x = (x1 , x2 , · · · , xN0 ) y (L) = y = (y1 , y2 , · · · , yNL )

(input) (output)

(7)

The MRL-NN is the system defined by (6) and (7) such that (l)

(l)

(l)

(l)

(l)

zn ≡ λn αn + (1 − λn )βn (l) αn = Rr(l) (y (l−1) + a(l) n ) (l) βn

(l)

n

=y

(l−1)

·

 (b(l) n )

+

(8)

(l) τn

(l)

every node is then defined by (l) (l) (l) (l) (l) w(l) n ≡ (an , ρn , bn , τn , λn ) ,

(9)

(l) ρn

instead of an integer rank where we use a real variable (l) variable rn because we will need to evaluate derivatives (l) during the design of MRL-NNs. The relation between ρn (l) and zn will be defined later via a differential equation, (l) (l) and rn is obtained from ρn using 1



rn(l) ≡ Nl−1 −

Nl−1 − 1 (l)

1 + exp(−ρn )



+ 0.5

.

(10)

Two important special cases of MRL-NNs are obtained when f is the identity, called MRL-NN of type I (l) (e.g., MRNN: λn = 1 ∀n, l), and when f is a nonlinearity of the logistic type, called MRL-NN of type II (e.g., (l) MLP: λn = 0 ∀n, l).

(l) Nl−1 ; and ‘ ’ denotes where λn , τn ∈ IR; a(l) n , bn ∈ IR transposition. Rr (t) is the r-th rank function of the vector t ∈ IRn . It is evaluated by sorting the components of t = (t1 , t2 , · · · , tn )2.2 Adaptive Design in decreasing order, t(1) ≥ t(2) ≥ · · · ≥ t(n) , and picking Based on the LMS criterion and using ideas from the the r-th element of the sorted list, i.e., Rr (t) = t(r) , back-propagation algorithm, we propose a steepest der = 1, 2, · · · , n. scent method to optimally design general NNs, and then Observe from (6) and (8) that the underlying funcapply it to MRL-NNs. The design goal is to achieve a set tion h is an MRL-filter [29] shifted by a threshold (1 − of parameters w(l) n , n = 1, 2, · · · , Nl , l = 1, 2, · · · , L, such (l) (l) (l) (l) λn )τn . The variables τn are important when λn = 0. that some cost function is minimized using a supervised For every MRL-filter, the vector b(l) procedure. Consider the training set n corresponds to the coefficients of a linear FIR filter, and the vector a(l) n repre{(x(k), d(k)) , k = 0, 1, · · · , K − 1} , (11) sents the coefficients of a morphological/rank filter. The (l)

(l)

variables rn and λn are the rank and the mixing parameters, respectively. The resulting weight vector for

1 · denotes the usual truncation operation, so that · + 0.5 is the usual rounding operation.

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

15

FRONTIERS IN MORPHOLOGICAL IMAGE ANALYSIS where d(k) is the desired system output to the training sample x(k). From (11) we generate the training sequence 2 (x([k] mod K ), d([k] mod K )) , k ∈ ZZ ,

(12) where

by making a periodic extension of (11). Every period of (12) is usually called an epoch. A general supervised training algorithm is of the form w(l) n (i

w(l) n (i)

v (l) n (i)

+ 1) = +µ , µ>0, n = 1, 2, · · · , Nl ; l = 1, 2, · · · , L ,

In this way, using the chain rule to evaluate U (l) (k), it can be shown that

(13)

1 M

i 

ξ(k) , 1 ≤ M ≤ K ,

ξ(k) ≡

NL 

(22)

  

Γ(l) = 

i 

(15)

∂ξ(k) (l)

∂wn

.

  

W (l) ≡ 

   

V (l) ≡ 



(l)

v1 (l) v2 .. . (l) v Nl

(l) w1 (l) w2

.. .

(l)

w Nl

     , U (l) ≡   

(l)

u1 (l) u2 .. . (l) u Nl

    , 

(l)

(25)

(l)

(l)

1 Nl−1

∂αn = s(l) n ≡ ∂ρ(l) n

(26)

(l)  Q(αn 1 − y (l−1) − a(l) n )·1 .

In (25) and (26), Q(v) ≡ (q(v1 ), q(v2 ), · · · , q(vn )), where q(v) ≡

(19)

(l)

(l)





(24)

Q(αn 1 − y (l−1) − an ) · 1



   , 

   , 

∂αn ∂αn = = c(l) n ≡ (l) ∂y (l−1) ∂an (l) (l−1) − a(l) Q(αn 1 − y n )

If we define the matrices W (l) , V (l) and U (l) by



(l)

(17)

(18)



(l)

θ1 (l) θ2 .. . (l) θ Nl

Based on this framework, the design of MRL-NNs can easily be derived. The difficulty is due to the nondifferentiability of rank functions, but we can circumvent this problem by using pulse functions as follows [29].

(16)

u(l) n (k) ,

(23)

∂zn ∂zn , γ (l) = . (l) n ∂y (l−1) ∂wn

θ(l) n =

1− u(l) n (k) = −



    (l)  , Θ =  

where

k=i−M +1

where



l

(14)

n=1

1 M

γ (l) 1 γ (l) 2 .. . γ (l) N

(l)

e2n (k) .

, l=L , otherwise

The matrices Γ(l) and Θ(l) are defined by

Based on the steepest descent algorithm, it follows from (13) and (15) that v (l) n (i) =

e(k) δ (l+1) (k) · Θ(l+1) (k)



k=i−M +1

where

δ (l) (k) = e(l) (k) F˙ (z (l) (k)) ,

3

e(l) (k) =

and the cost function J(i) ≡

(21)



where the positive constant µ controls the tradeoff between stability and speed of convergence, v (l) n = −∇J, and J is some cost function to be minimized. Let us define the error signal e(k) = (e1 (k), e2 (k), · · · , eNL (k)) ≡ d([k] mod K ) − y(k) ,

U (l) (k) = 2 diag(δ (l) (k)) · Γ(l) (k) ,

1 0

, if v = 0 , if v ∈ IR \ {0}

(27)

and 1 = (1, 1, · · · , 1). To avoid abrupt changes and achieve numerical robustness, we frequently replace the function q(v) by smoothed impulses qσ (v), σ ≥ 0, such as exp[− 12 (v/σ)2 ] or sech2 (v/σ). The remaining unknown is F˙ (·), that depends on the type of the MRL-NN in use. For the MRL-NN of type I, F (z (l) ) = z (l) , so that F˙ (z (l) ) = 1. For the MRLNN of type II, we will use f (z) = [1 + exp(−η z)]−1 , η ≥ 1, whose derivative is f˙(z) = η f (z)[1 − f (z)], so that F˙ (z (l) ) = ηy (l) [1 − y (l) ].

then the algorithm (13) can be written as

2.3

W (l) (i + 1) = W (l) (i) + µ V (l) (i) , V (l) (i) =

1 M

 i

U (l) (k) ,

(20)

k=i−M +1

l = 1, 2, · · · , L 2 [k]

mod K

≡ k − K k/K denotes the index k modulo K.

Application in OCR

Using the design framework discussed in the previous section, we now describe some experimental results in a problem of optical character recognition (OCR). Our 3 We denote F˙ (z) ≡ (f˙(z ), f˙(z ), · · · , f˙(z )). The symbol n 1 2 ‘’ denotes an array (element-by-element) multiplication.

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

16

Training Testing Epoch

P. Maragos, A. Butt, L. Pessoa

MRL5 11.8/13.2/10.5 18.7/22.4/15.0 3

MLP5 9.9/8.8/11.1 18.4/19.9/16.9 10

FM / E(θ) / R(θ) MRL10 MLP10 7.4/6.9/7.8 7.4/7.0/7.7 11.0/13.1/8.9 11.1/10.9/11.4 62 96

MRL20 8.4/7.8/9.0 17.4/24.6/10.2 9

MLP20 7.5/6.8/8.2 11.8/12.8/10.9 88

Table 1: Figure of merit / mean error rate / mean rejection rate corresponding to the optimal set of weights of best MRL-NNs vs. best MLPs for µ = 0.05.

approach is to perform a comparative analysis of MRLNNs versus MLPs, illustrating some of the characteristics of both systems. We show that the MRL-NNs are a good alternative to MLPs, usually providing equal or better performance with smaller training times. To do so, we used a large database of handwritten characters provided by the National Institute of Standards and Technology (NIST) [27]. We selected a total of K = 61, 094 samples of handwritten digits to form our data set. In our simulations, we normalized the feature vectors (64 dimensional Karhunen-Lo`eve transforms) so that each x(k) ∈ [0, 1]. The data set was split such that 45, 000 digits were used for training and the remaining 16, 094 digits were used for testing. The first 15, 000 elements of the training set were used as a validation set during the training process. The training sequence was ordered such that one instance of every digit is presented to the system in each iteration. After making several tests, we have set a group 12 experiments with 3 different network topologies: 64-N10 MRL-NNs and MLPs, N = 5, 10, 20. This notation indicates a system with 64 inputs, N hidden nodes, and 10 outputs. Two different step sizes were tested: µ = 0.005, 0.05. Every experiment was repeated 5 times with different random initial conditions, and the best result is reported here. Among many possible ways to initialize the systems, and after performing various tests, we initialized the weights randomly in the ranges: a(l) : √ √ n (l) (l) [−0.1, 0.1], rn : [1, Nl−1 ], bn : [−1/ Nl−1 , 1/ Nl−1 ], (l) (l) τn : [−0.1, 0.1], λn : [0.4, 0.6]. Further, in order to estimate gradients, we smoothed impulses with qσ (v) = exp[− 12 (v/σ)2 ], σ = 0.05. Due to the size of the training set, we have used the proposed training algorithm with M = 1 only. We have tested the case M > 1 with a small subset of the training set, but no signanificant improvements were observed. Both MRL-NNs and MLPs were defined using a sigmoid activation function with η = 1 (MRL-NNs of type II). As usual, the desired system output d = (d0 , d1 , · · · , d9 ) was defined by

dn =

1 0

, x ↔ digit n , otherwise

(28)

In the attempt to compare different systems, a figure of merit (FM) was defined as follows 1 F M (t) = ( E(θ) + R(θ) ) 2

(29)

where θ ∈ [0, 1] is the confidence threshold; t is the epoch; E is the error rate (%), computed, for a given θ, as the ratio of the number of misclassified digits over the number of digits that were not rejected during the classification (in a percentage basis); R is the rejection rate (%), computed, for a given θ, as the ratio of the number of rejected digits over the total number of elements in the set under consideration (also in a percentage basis); and

E(θ) =

9 9 1  i i 1  E( ) , R(θ) = R( ) 10 10 10 10 i=0

i=0

The training process tends to decrease the figure of merit, and good performance corresponds to small values of FM. A given classification is rejected if the desired n-th output is dn = 1 but the actual n-th output has the property max{y} = yn < θ, where y is the system output. An error (misclassification) is obtained when dn = 1 but max{y} = yn . The error rate is computed excluding the rejected digits. Using our proposed training algorithm with all the above considerations, we observed that, either for a step size µ = 0.005 or µ = 0.05, the MRL-NNs required a smaller number of iterations than MLPs, and provided similar performances (FMs). Computing the figures of merit of MLPs with equal number of iterations of MRLNNs, we usually observed better performances of MRLNNs. Table 1 summarizes some of the results. The best training performance was obtained with a 64-10-10 MRLNN (MRL10, FM=7.4%). Similar results were obtained with a 64-10-10 MLP (MLP10, FM=7.4%) and a 64-2010 MLP (MLP20, FM=7.5%), but with a larger number of iterations. These best results were all obtained with µ = 0.05.

References References for Part I [1] L. Alvarez, F. Guichard, P.L. Lions, and J.M. Morel, “Axioms and Fundamental Equations of Image Processing”, Archiv. Rat. Mech., vol. 123 (3), pp. 199– 257, 1993. Also, C. R. Acad. Sci. Paris, pp. 265-268, t.315, Serie I, 1992. [2] A. Arehart, L. Vincent and B. Kimia, “Mathematical Morphology: The Hamilton-Jacobi Connection”, in Proc. ICCV-93, pp. 215–219, 1993.

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.

17

FRONTIERS IN MORPHOLOGICAL IMAGE ANALYSIS [3] G. Borgefors, “Distance Transformations in Digital Images”, Comp. Vision, Graphics, Image Process., 34, pp. 344–371, 1986. [4] M. Born and E. Wolf, Principles of Optics, Pergamon Press, Oxford, England, 1959 (1987 edition). [5] R. Brockett and P. Maragos, “Evolution Equations for Continuous-Scale Morphological Filtering”, IEEE Trans. Signal Processing, vol. 42, pp. 33773386, Dec. 1994. Also, in Proc. ICASSP-92, San Francisco, 1992. [6] M. A. Butt and P. Maragos. “Optimal Design of Chamfer Distance Transforms”, IEEE Trans. Image Processing, in press. [7] B.K.P. Horn, Robot Vision, MIT Press, Cambridge, MA, 1986. [8] B. Kimia, A. Tannenbaum, and S. Zucker, “Toward a Computational Theory of Shape: An Overview”, Proc. ECCV-90, France, April 1990. [9] R. Kimmel, N. Kiryati, and A. M. Bruckstein, “SubPixel Distance Maps and Weighted Distance Transforms”, J. Math. Imaging and Vision, 6, pp. 223233, 1996. [10] R. Malladi, J. A. Sethian, and B. C. Vemuri, “A Fast Level Set Based Algorithm for TopologyIndependent Shape Modeling”, J. Math. Imaging and Vision, 6, pp. 269-289, 1996. [11] P. Maragos, “Differential Morphology and Image Processing” IEEE Trans. Image Processing, vol. 78, pp. 922–937, June 1996. [12] P. Maragos and R. W. Schafer, “Morphological Systems for Multidimensional Signal Processing”, Proc. IEEE, vol. 78, pp. 690-710, Apr. 1990. [13] J. Mattioli, “Differential Relations of Morphological Operators”, Proc. 1st Int’l Workshop on Math. Morphology and its Application to Signal Processing, Barcelona, Spain, May 1993. [14] F. Meyer, “Topographic Distance and Watershed Lines”, Signal Processing, vol. 38, pp. 113-125, July 1994. [15] F. Meyer and S. Beucher, “Morphological Segmentation”, J. Visual Commun. Image Representation, 1(1):21–45, 1990. [16] L. Najman and M. Schmitt, “Watershed of a Continuous Function”, Signal Processing, vol. 38, pp. 99112, July 1994. [17] S. Osher and J. Sethian, “Fronts Propagating with Curvature-Dependent Speed: Algorithms Based on Hamilton-Jacobi Formulations”, J. Comput. Physics, 79, pp. 12–49, 1988. [18] Y. Pnueli and A. M. Bruckstein, “DigiD u ¨rer - a digital engraving system”, The Visual Computer, 10, pp. 277–292, 1994. [19] E. Rouy and A. Tourin, “A Viscocity Solutions Approach to Shape from Shading”, SIAM J. Numer. Anal., vol. 29 (3), pp. 867-884, June 1992.

[20] G. Sapiro, R. Kimmel, D. Shaked, B. Kimia, and A. Bruckstein, “Implementing Continuous-scale Morphology via Curve Evolution”, Pattern Recognition, 26(9), pp. 1363–1372, 1993. [21] M. Schr¨ oder, “The Eikonal Equation”, Math. Intelligencer, 1, pp. 36–37, 1983. [22] J. Serra, Image Analysis and Mathematical Morphology, Acad. Press, NY, 1982. [23] J. A. Sethian, Level Set Methods, Cambridge Univ. Press, 1996. [24] R. van der Boomgaard and A. Smeulders, “The Morphological Structure of Images: The Differential Equations of Morphological Scale-Space”, IEEE Trans. Pattern Anal. Mach. Intellig., vol. 16, pp.1101-1113, Nov. 1994. Also, Ph.D. Thesis, Univ. of Amsterdam, 1992. [25] P. Verbeek and B. Verwer, “Shading from shape, the eikonal equation solved by grey-weighted distance transform”, Pattern Recogn. Lett., 11:618–690, 1990. [26] L. Vincent and P. Soille, “Watershed In Digital Spaces: An Efficient Algorithm Based On Immersion Simulations”, IEEE Trans. PAMI, vol. 13, pp. 583–598, June 1991. References for Part II [27] M.D. Garris, J.L. Blue, G.T. Candela, D.L. Dimmick, J. Geist, P.J. Grother, S.A. Janet, and C.L. Wilson, “NIST form-based handprint recognition system”, Tech. Rep. NISTIR 5469, Nat’l Inst. Standards & Technology, July 1994. [28] S. Marcos, O. Macchi, C. Vignat, G. Dreyfus, L. Personnaz, and P. Roussel-Ragot, “A unified framework for gradient algorithms used for filter adaptation and neural network training”, Int’l J. Circuit Theory & Applications, 20:159–200, 1992. [29] L. F.C. Pessoa and P. Maragos, “MRL-Filters: A general class of nonlinear systems and their optimal design for image processing,” IEEE Trans. Image Processing, July 1998. [30] L. F.C. Pessoa and P. Maragos, “Morphological/rank neural networks and their adaptive optimal design for image processing”, Proc. ICASSP-96, vol.6, pp.3399–3402, May 1996. [31] D.E. Rumelhart, G.E. Hinton, and R.J. Willians, “ Learning Internal Representations by Error Propagation,” Parallel Distributed Processing, Vol.1, D.E. Rumelhart and J.L. McClelland, eds., pp.318–362, MIT Press, Cambridge, MA, 1986. [32] P. Salembier, “Adaptive rank order based filters,” Signal Processing, 27:1–25, Apr. 1992. [33] G.M. Shepherd and R.K. Brayton, “Logic operations are properties of computer-simulated interactions between excitable dendritic spines,” Neuroscience, 21(1):151–165, 1987. [34] P.-F. Yang and P. Maragos, “Min-max classifiers: Learnability, design and application,” Pattern Recognition, 28(6):879–899, 1995.

Proceedings of Int’l Symposium on Computer Graphics, Image Processing & Vision (SIBGRAPI-98), Rio de Janeiro, Brazil, Oct. 1998.