Image coding using neighbourhood relations - Semantic Scholar

Report 8 Downloads 107 Views
Pattern Recognition Letters 20 (1999) 1279±1286

www.elsevier.nl/locate/patrec

Image coding using neighbourhood relations I.J. Tsang *, I.R. Tsang, D. Van Dyck Vision Lab, Department of Physics, University of Antwerp, RUCA, Groenenborgerlaan 171, Antwerp B-2020, Belgium

Abstract A novel coding algorithm for binary images based on neighbourhood relations is used for the problem of handwritten numerals recognition. Each pixel of an image is transformed into a set of representative vectors by coding it according to the number of neighbours in the four directions (north, east, south, west). These neighbourhood vectors are transformed into a set of codes satisfying the boundary condition imposed by the size of the image in which the shape is embedded. A code reduction function is used for the purpose of information reduction and generalization of the shape images. Using the digits of the NIST handwritten segmented characters set, we show an application of the neighbourhood coding for pattern recognition. Ó 1999 Elsevier Science B.V. All rights reserved. Keywords: Image coding; Shape description; Character recognition

1. Introduction Shape description and recognition is a fundamental issue in pattern recognition. According to Pavlidis (1978), algorithms for shape analysis can be classi®ed as whether they examine only the shape boundary or the whole area, and whether they describe the image in scalar measurements or through structural descriptions. Shape analysis has been a ®eld of intense study in image processing. Di€erent methods have been proposed using a morphological function (Sinha and Giardina, 1990), a gradient propagation method (Ben-Arie et al., 1995) or a special weighted graph for shape similarity (Kupeev and Wolfson, 1996). Diverse techniques like Fourier descriptors, template matching or invariant moments are also used for shape description and recognition.

*

Corresponding author. E-mail address: [email protected] (I.J. Tsang)

Character recognition is one of the ®elds where shape analysis is applied. In this paper, we focus on the recognition of unconstrained handwritten numerals. Many studies, using di€erent approaches, have been done in this ®eld. Suen et al. (1992) classify these approaches into two categories: global analysis or structural analysis, which are used in conjunction with statistical classi®cation methods or a syntactical classi®cation approach. Lee (1996) achieved impressive results using a multilayer cluster neural network combined with a genetic algorithm and a Kirsch edge detector method as a feature extraction. In view of many possible practical applications, robust and fast methods that match the human performance are required and are subject of intense research. Many of the low-level image analysis operations can be performed using neighbourhood operators (Haralick and Shapiro, 1992). Since these operators act locally, we have looked for methods that better describe the global structure of the image using neighbourhood relations. In this paper, we

0167-8655/99/$ - see front matter Ó 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 9 9 ) 0 0 0 9 6 - 3

1280

I.J. Tsang et al. / Pattern Recognition Letters 20 (1999) 1279±1286

present a novel method for coding binary shapes. Each pixel is coded according to the number of neighbours in the four directions (north, east, south, west). In this way, information about the pattern structure is maintained under translation invariance, while the shape is examined taking into consideration the whole area instead of just the borders. We present an application of this code scheme using a simple code reduction technique for the problem of handwritten numerals recognition. For the classi®cation method we used a back-propagation neural network algorithm. The structure of this paper is as follows. Section 2 describes the shape structure, the pixel to neighbourhood vector representation and the invariance of this representation. Section 3 describes in details the neighbourhood coding scheme and a simple code reduction technique. In Section 4 we present the experiments and results. Finally, in Section 5 the results are summarized and discussed. 2. Shape structure Handwritten characters can be decomposed into one or more connected components, as for example letters ``s'' and ``i''. For the numerals, they consist mainly of a single component. We decompose the patterns using the 4-connected neighbourhood. Each pixel is coded according to the number of neighbours in the four directions. That is, if a pixel is in the inner part of the image the number of neighbours in one of the directions would be the number of pixels separating the aforementioned pixel from the border. The structural information of the image is maintained in the same way that pieces of a jigsaw puzzle put together form a distinct pattern. These vector codes are translation invariant, and rotation can be accomplished at the code level, so no further operation on the image is necessary.

Fig. 1. Neighbourhood vector V ˆ …n; e; s; w†.

directions V ˆ …n; e; s; w†, in which each letter respectively represents the direction north, east, south and west (see Fig. 1). The set of all these neighbourhood vectors describes the image (see Fig. 2).

2.1. Component description Using a connected component algorithm, each pixel of the image is transformed into a vector containing the number of neighbours in the four

Fig. 2. Neighbourhood vector codes.

I.J. Tsang et al. / Pattern Recognition Letters 20 (1999) 1279±1286

We used the Hoshen±Kopelman (HK) algorithm (Hoshen and Kopelman, 1976) for the implementation of the proposed code scheme. Even though it is sequential, the algorithm is very ecient for connected component analysis, since it just needs one pass over the image to discriminate the di€erent components. It is straightforward to implement the necessary changes to obtain the neighbourhood vectors for each pixel of the image. These implementations do not a€ect greatly the general performance of the HK algorithm. 2.2. Invariance Two types of invariance are addressed here, translation and rotation invariance. Translation invariance comes as a natural consequence of this representation scheme, which is an excellent property when shape analysis and recognition is of concern. Rotation invariance must be implemented, this means additional computation e€ort. However, certain applications do require a rotational variance property, for example to distinguish between the characters ``6'' and ``9''. In this paper, only 90°, 180° and 270° rotations are taken into consideration. A 90°, 180° or 270° rotation of the image means a shift of coordinates in the neighbourhood vectors. If we ®x at a speci®c pixel and rotate the image, it is quite intuitive that a rotation will mean a change of coordinate on the neighbourhood vector. As a result, no further manipulation on the original image is necessary for these operations. It is just necessary to change the components of the neighbourhood vectors. Table 1 shows the changes of coordinates due to clockwise rotation operations.

Table 1 Changes of coordinates due to clockwise rotations and re¯ections (the ®rst row shows the original coordinate) 0°

North

East

South

West

90° 180° 270°

East South West

South West North

West North East

North East South

1281

3. Neighbourhood coding When each pixel of the pattern is decomposed into the neighbourhood vectors V ˆ …n; e; s; w†, a transformation function is necessary to map the vector into a scalar, represented by the code C. This transformation depends on the size of the image and in practice can yield a large amount of codes. Consequently, a code reduction technique is necessary. Presently, we use a simple linear and a logarithm function for the code reduction. A controllable scheme for information reduction and learning are very closely related. Therefore, the use of better code reduction techniques is possible and may improve the generalization capability of this code scheme. 3.1. Coding according neighbourhood The transformation function for the vector V ˆ …n; e; s; w† into a code C ˆ F…n; e; s; w† depends on the size L of the image. We assume a square image, the equations for a rectangular image are straightforward. The greatest value that n; e; s or w can assume is L ÿ 1, so the transformation function has the following boundary conditions: n ‡ s < L;

…1†

e ‡ w < L:

…2†

If we do not take into account the boundary conditions, the variables a and b generated by the related pairs …n; s† and …e; w†, respectively, are given by a ˆ nL ‡ s;

…3†

b ˆ eL ‡ w:

…4†

Considering the case where L ˆ 2 (see Table 2), one can verify that the related pairs …n; s† and …e; w† which do not obey properties (1) and (2) can be eliminated. By induction we have that the pairs that do not satisfy the boundary conditions follow an arithmetic series. So, when the boundary conditions are satis®ed, Eqs. (3) and (4) become

1282

I.J. Tsang et al. / Pattern Recognition Letters 20 (1999) 1279±1286

Table 2 Example of all possible codes for L ˆ 2 (the empty space in C are the codes that do not satisfy the boundary condition) (nesw)

a

b

C

(nesw)

a

b

C

(nesw)

a

b

C

(nesw)

a

b

0000 0001 0100 0101

0 0 0 0

0 1 2 3

0 1 2

0010 0011 0110 0111

1 1 1 1

0 1 2 3

3 4 5

1000 1001 1100 1101

2 2 2 2

0 1 2 3

6 7 8

1010 1011 1110 1111

3 3 3 3

0 1 2 3

n…n ÿ 1† ; 2 e…e ÿ 1† b ˆ eL ‡ w ÿ : 2

a ˆ nL ‡ s ÿ

…5† …6†

After coding for a and b, a unique representation for the vector V follows, C ˆ aK ‡ b;

…7†

where K is the maximum value that b can assume plus one. b is maximal when e ˆ L ÿ 1 and w ˆ 0, so we have bmax ˆ

L2 L ‡ ÿ1 2 2

…8†

leading to K ˆ bmax ‡ 1 ˆ

L2 ‡ L : 2

…9†

Substituting K in Eq. (7), the transformation function for the vector V to the code C is given by Cˆa

L2 ‡ L ‡ b: 2

…10†

To calculate the total number of codes generated in Eq. (10), set a and b to the maximum. This yields the inequality 2

06C6

…L2 ‡ L† ÿ1 4

…11†

since n; e; s and w are positive integers. The total number of codes is then given by nˆ

2

…L2 ‡ L† : 4

…12†

Eq. (12) tells us that if a square image has the size L ˆ 128 the number of possible codes are n ˆ 68 161 536. In practice, most of the patterns used in any recognition system can be scaled down

C

to a smaller image size and still be representative. As a result, if a pattern is scaled down to an image of size L ˆ 16 the total amount of codes would be n ˆ 18 496, which can be reduced even further using di€erent code reduction techniques. 3.2. Code reduction Information reduction is necessary for generalization of the shape patterns. Several di€erent code reduction techniques are possible and optimal ways to perform this task according to a speci®c pattern recognition problem can be devised. Likewise, it is necessary to reduce the number of codes so that the problem can be tackled in a practical sense. As the number of codes increases, more computer resources are needed both in memory and processing power. The neighbourhood coding method yields a large number of possible codes. Consequently, we present a simple and ecient approach to reduce this number. It consists in re-scaling the components of the vector V ˆ …n; e; s; w† into V 0 ˆ …n0 ; e0 ; s0 ; w0 †. This can be done by linear or logarithmic re-scaling. Suppose that our image size is L and we want to re-scale so that L ! L0 , in this way the maximum value that n0 ; e0 ; s0 or w0 can assume is L0 ÿ 1. Choosing an L0 which will yield a reasonable amount of codes and a re-scaling function, which preserves the interesting structural information of the image, is problem dependable. In addition, we must ful®ll the new boundary conditions: n 0 ‡ s0 < L0 ;

…13†

0

…14†

0

0

e ‡w