(Indian) numerals using hidden Markov models - KFUPM Faculty List

Comment

Report 7 Downloads 91 Views

ARTICLE IN PRESS

Signal Processing 88 (2008) 844–857 www.elsevier.com/locate/sigpro

Recognition of writer-independent off-line handwritten Arabic (Indian) numerals using hidden Markov models Sabri Mahmoud King Fahd University of Petroleum and Minerals, P.O. Box 1378, Dhahran 31261, Saudi Arabia Received 11 July 2007; received in revised form 30 September 2007; accepted 1 October 2007 Available online 9 October 2007

Abstract This paper describes a technique for the recognition of optical off-line handwritten Arabic (Indian) numerals using hidden Markov models (HMM). The success of HMM in speech recognition encouraged researchers to apply it to text recognition. In this work we did not follow the general trend of using sliding windows in the direction of the writing line to generate features. Instead we generated features based on the digit as a unit. Angle-, distance-, horizontal-, and verticalspan features are extracted from Arabic (Indian) numerals and used in training and testing the HMM. These features proved to be simple and effective. In addition to the HMM the nearest neighbor classiﬁer is used. The results of both classiﬁers are then compared. Several experiments were conducted for estimating the suitable number of states for the HMM. The best results were achieved with an HMM model with 10 states. In addition, we experimented with different number of features. The best results were achieved with 120 feature vector representing a digit. A database of 44 writers, each writer wrote 48 samples of each digit resulting in a database of 21,120 samples. The data were size normalized to enable the technique to be size invariant. In extracting the features the center of gravity of the digit is used to make the technique translation invariant. The randomization technique was used to generate Arabic (Indian) numbers for training and testing the HMM classiﬁer. The randomization was done on the number of digits per number and on the digit sequence. About 2171 Arabic (Indian) numbers were generated, totaling 21,120 digits. 1700 numbers (totaling 16,657 digits) were used in training the HMM and 471 numbers (totaling 4463 digits) are used in testing the HMM. The samples of the ﬁrst 24 writers were used in training the nearest neighbor classiﬁer and the remaining 20 writers’ samples were used in testing. The achieved average recognition rates are 97.99% and 94.35% using the HMM and the nearest neighbor classiﬁers, respectively. The classiﬁcation errors were analyzed and it was clear that some errors may be attributed to bad data, some to deformation and unbalanced proportion of digit segments, different writing styles of some digits, errors between digit pairs were speciﬁed and analyzed, and genuine errors. It was clear that the real misclassiﬁcation of genuine data, in the case of HMM was nearly 1%. This proves the effectiveness of the presented technique to writerindependent off-line Arabic (Indian) handwritten digit recognition. The technique is writer independent as separate writers’ data were used in training of the classiﬁers and other writers’ data were used in the testing phase. r 2007 Elsevier B.V. All rights reserved. Keywords: Arabic (Indian) numeral recognition; OCR; HMM; Handwritten digit recognition; Independent writer recognition; Normalization

E-mail address: [email protected] 0165-1684/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2007.10.002

ARTICLE IN PRESS S. Mahmoud / Signal Processing 88 (2008) 844–857

1. Introduction Machine simulation of human reading (i.e. optical character recognition) has been the subject of extensive research for more than 5 decades. The convenience of paper, its widespread use for communication and archiving, and the amount of information already on paper, press for quick and accurate methods to automatically reading that information and convert it into electronic form [1]. The potential application areas of automatic reading machines are numerous. One of the earliest and most successful applications is sorting check in banks, as the volume of checks that circulates daily has proven to be too enormous for manual entry [2,3]. Handwritten digit recognition is a vital component in many applications; ofﬁce automation, check veriﬁcation, and a large variety of banking, business, postal address reading, sorting and reading handwritten and printed postal codes and data entry applications are few examples. The recognition of handwritten text (characters and numeral digits) is a more difﬁcult task due to the different handwriting styles of the writers that is subject to inter- and intra-writer variations. Arabic handwriting, unlike Latin, has many handwriting styles like Naskh, Koﬁ, and others. In several instances, writers mix between these writing styles. This makes the recognition problem more difﬁcult hence requiring more sophisticated and very advanced feature extraction and recognition techniques. Arabic text recognition (ATR) has not been researched as thoroughly as Latin, Japanese, or Chinese. The lag of research on ATR compared with other languages (e.g. Latin or Chinese) may be attributed, in part, to lack of adequate support in terms of human resources, journals, books, etc.; lack of general supporting utilities like Arabic text databases, dictionaries, programming tools; and the special characteristics of Arabic language. The calligraphic nature of the Arabic set is distinguished from other languages in several ways. For example, Arabic text is written from right to left, with the Arabic having 28 basic characters, of which 16 have from one to three dots. Those dots differentiate between the otherwise similar characters. Within a word, some characters connect to the preceding and/or following characters and some do not connect. The shape of an Arabic character depends on its position in the word; a character might have up to four different shapes depending on it being

845

isolated, connected from the right (beginning form), connected from the left (ending form), or connected from both sides (middle form). Characters in a word may overlap vertically (even without touching). Arabic characters do not have ﬁxed size (height and width). On the other hand Arabic (Indian) numerals are not cursive. Fig. (1a) shows the Arabic (Indian) numerals. Indian numerals are used in Arabic writing while Arabic numerals are used in Latin languages. Hence, when the term ‘Arabic numerals’ is used it refers to the Indian numerals that are used in Arabic. Although Arabic text is written right to left, Arabic (Indian) numbers are written left to right with most signiﬁcant digit being the left-most one and the least signiﬁcant digit is the right-most one. However, the way the digits are stored in memory is in the reverse order (viz. most signiﬁcant digit is stored ﬁrst and so on), is contrary to the way the number is displayed and seen in the scanned image. For example, Fig. (1b) shows the Arabic (Indian) number 9876. Digit 9 is written ﬁrst then digits 8, 7, and 6 are written last. Digit 9 is the most signiﬁcant digit and digit 6 is the least. Scanning an Arabic (Indian) number by scanner and saving the image will give 6 as the right-most digit, then 7, 8, then 9 as the left-most digits. The scanned image will have the digits of an Arabic (Indian) number in the reverse order of truth value in the text ﬁle. Hence, care must be taken in case of automation of the scanned images truth values generation. Various methods have been proposed and high recognition rates are reported for the recognition of English handwritten digits [4–8]. In recent years many researchers addressed the recognition of Arabic text including Arabic (Indian) numerals [9–17]. Surveys on Arabic optical text recognition may be cited in [1,18,19]. Bazzi et al. [20,21] presented a system for bilingual text recognition (English/Arabic). In addition, several researchers reported the recognition of Persian (Arabic) handwritten digits. However, the reported recognition rates need more improvements to be practical [22–26]. Al-Omari [9] presented a recognition system for Indian numeral digits using average template-matching

Fig. 1. (a) Arabic (Indian) numerals, (b) Arabic (Indian) number.

ARTICLE IN PRESS 846

S. Mahmoud / Signal Processing 88 (2008) 844–857

approaches. Freehand sketches of online numeric digits placed on an image template were processed to extract a key feature vector representing signiﬁcant boundary point distances from the digit center of gravity (COG). A model for each numeric digit is formed by processing 30 handwritten digit samples. Classiﬁcation was made using the Euclidean distance between the feature vector of the test samples and the models. In another work Al-Omari and Al-Jarrah [10] presented a recognition system for online handwritten Indian numerals one to nine. The system skeletonizes the digits and then geometrical features of the skeleton of the digits are extracted. Probabilistic neural networks (PNNs) are used for classiﬁcation. The developed system is translation, rotation, and scaling invariant. The authors claim that the system may be extended to address Arabic characters [10]. [11] Presented an algorithm based on structural techniques for extracting local features from the geometric and topological properties of online Arabic characters using fuzzy logic. Salah et al. [12] developed a serial model for visual digit classiﬁcation based on the primitive selective attention mechanism. The technique is based on parallel scanning of a down sampled image to ﬁnd interesting locations through a saliency map, and by extracting key features at those locations at high resolution. Shahrezea et al. [22] used the shadow coding method for recognition of Persian handwritten digits. In this method, a segment mask is overlaid on the digit image and the features are calculated by projecting the image pixels into these segments. In [23] the Persian digit images are represented by line segments that are used to model and recognize the digits. Additional features and classiﬁer are needed for discriminating the digit pairs ‘‘0–5’’, ‘‘7–8’’, ‘‘4–6’’. Said et al. [24] fed the pixels of the normalized digit image as is into a neural network for classiﬁcation, where the number of hidden units for the neural network classiﬁer is determined dynamically. Sadri et al. [25] used a feature vector of length 16 that is estimated from the derivative of the horizontal and vertical proﬁles of the image. [26] Used the normalized image proﬁle calculated at multiple orientations as the main feature for the recognition of Persian handwritten digits. The crossing counts and projection histogram calculated at multiple orientations are used as complementary features. The authors indicated that most of the system errors occurred in discriminating the digits ‘‘2’’, ‘‘3’’, ‘‘4’’ and ‘‘0’’, ‘‘5’’. Hence, discriminating

these digits requires the use of additional features and may require the use of additional classiﬁers. It is worth mentioning that there is no generally accepted database for Arabic text/numeral recognition that is freely available for researchers. Hence different researchers of Arabic text/numeral recognition use different data and hence the recognition rates of the different techniques may not be comparable. In order to help in tackling this problem of Arabic (Indian) numerals data, the author will make his data freely available to interested researchers. In this paper, we present a simple, effective, and scalable technique for the recognition of writerindependent ofﬂine handwritten Indian numerals (0,1,y,9) used in Arabic writing. The presented technique was implemented using HMMs and the nearest neighbor classiﬁer. The results of the two classiﬁers were analyzed and compared. Although we are addressing Arabic (Indian) numeral recognition in this paper, we aim at extending the work for addressing ATR in the future using the same technique. Using this technique for text recognition requires segmenting Arabic text or estimating possible segmentation points and using this information in the feature extraction and in the recognition engine. An Arabic (Indian) number may consist of an arbirary number of digits. The recognition system performs classiﬁcation on each digit independently, preserving its relative position with respect to other digits in order to obtain the actual value of the number after recognition. The subsequent stages of the developed recognition system have enough ﬂexibility to treat variations, line thickness, writing size, and translation of the handwritten string. The left to right position order of each digit is preserved to account for the digit weight after individual digit classiﬁcation. This paper is organized as follows. Feature extraction is addressed in Section 2 where four types of features are used. Section 3 addresses data preparation and normalization. Hidden Markov models (HMM) are addressed in Section 4, Training, recognition, and experimental results are addressed in Section 5, and ﬁnally the conclusions are presented in Section 6. 2. Feature extraction To use HMMs several researchers computed the feature vectors as a function of an independent

ARTICLE IN PRESS S. Mahmoud / Signal Processing 88 (2008) 844–857

variable. Normally this independent variable is time in the case of speech recognition. This simulates the use of HMM in speech recognition where sliding frames/windows are used. The same technique is used in off-line text recognition where the independent variable is in the text line direction [20,21]. This enables the use of HMM engine of speech recognition in text recognition. In this paper we are using different technique to extract the features of an Arabic (Indian) numeral using the numeral as a whole (and not using a sliding window). However, the same HMM classiﬁer is used without modiﬁcation. We believe that using sliding windows limits the type of features that may be extracted for a numeral. In our technique, many types of features used for off-line text recognition, using other classiﬁers, may be used with the HMM classiﬁer. In this work the following sections present the features that are extracted for each Arabic (Indian) numeral. 2.1. Angular span features To enable the presented technique to be translation invariant we are using the digit COG to estimate the angle span features. The COG of the digit image (xc, yc) is estimated using Eq. (1). This is used as the center of the numeral image. ! Pm Pn Pm Pn j¼1 i¼1 iI½i; j j¼1 i¼1 jI½i; j ðxc ; yc Þ ¼ Pm Pn ; Pm Pn , j¼1 i¼1 I½i; j j¼1 i¼1 I½i; j (1) where I is a binary image of dimension m n, xc and yc are the x- and y-coordinates of the digit COG. The image of the Arabic (Indian) numeral is sliced using angular lines with angles of a degrees between consecutive lines passing through the COG (xc, yc). Fig. 2 shows the slicing of Arabic (Indian) numerals 4 and 9. The number of black pixels in each slice is computed using the lines’ Eq. (2) at the different angles. y ¼ mx þ b,

(2)

where m is the slope of the line, b is the y-intercept, m ¼ tan(y),and y is the line inclination angle. Since each line passes through the COG, by putting (xc, yc) in Eq. (2), the y-intercept is estimated. Slice 1 is formed by the two lines having slope ¼ 0 and tan(y1), where y1 is the angle of the ﬁrst intercepting line (y1 equal to a, y2 equal 2a, and so on). The two lines are calculated by substituting

847

Fig. 2. The angle slicing of Arabic (Indian) digits 4 and 9.

the values of m and b in Eq. (2) for the two lines. The x-coordinate of each black pixel is substituted in the equations of the two lines to ﬁnd the corresponding y-coordinates. If for a black pixel y falls between the two lines of a slice then this pixel is considered within slice 1 and slice 1 counter is incremented by 1. This procedure is repeated for other slices. Hence a digit image will have 360/a slices (features). These features are normalized by dividing the number of black pixels in each slice by the total number of black pixels of the Arabic (Indian) digit. Several values of a were used in our experimentation of the technique. The results are presented in the following sections. 2.2. Distance span features The COG of the Arabic (Indian) digit is used in the extraction of the distance span features. Distance d between COG (xc, yc) and digit image origin (0, 0), which is the index of the top most left most pixel, is calculated using the Euclidean qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ distance formula, d ¼ ðxc 0Þ2 þÞðyc 0Þ2 ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ x2c þ y2c . Several concentric circles having center at (xc, yc) are used. Radius of these circles can be found out by using equation ri ¼ d/c 2i1, where C is the number of used concentric circles, i ¼ 1,y,C, and d as above. The distance span feature values are calculated by summing up the black pixels between two consecutive concentric circles. The ﬁrst feature value will be the number of black pixels within concentric circle having radius r1 while the second feature value will be the number of pixels outside circle having radius r1 and the inside circle having radius r2 and so on. To ﬁnd these

ARTICLE IN PRESS 848

S. Mahmoud / Signal Processing 88 (2008) 844–857

feature values, the number of pixels Pi (where i ¼ 1,y,C) within the C circles are calculated. Consider the general form of the circle equation, r2 ¼ (xxc)2+(yyc), where (xc, yc) is the COG of the image and r is the radius of circle. The values of the black pixels are substituted in the equation and the counters belonging to the circles with radius larger than the estimated radius is incremented. The ﬁrst feature is equal to P1, the second feature is P2P1, and so on. The outer leftover portion of the image is treated as the last feature, hence resulting in C+1 features. The extra feature comes from estimating the remaining black pixels that do not fall in any of the concentric circle. These features are then normalized by dividing them by total number of pixels of the Arabic (Indian) numeral. Fig. 3 illustrates digits 4 and 9 with the concentric circles.

It is not possible to use HMM with one observation vector for a digit. HMM requires several observations for each digit so that transition from one state to another is possible in the training and testing phases of HMM. While other types of features extracted using sliding windows produce large number of observations, our technique uses less number of features. This reduced number of features was adequate in obtaining high recognition rates. Hence, we represented each Arabic (Indian) numeral to the HMM as a set of 12 observations, each consisting of 10 features. It is noted that other techniques using sliding windows produce much more features (more than 20-folds) than this technique for a character on the average. Fig. 4 shows the horizontal and vertical slicing of Arabic (Indian) digits 4 and 9.

2.3. Horizontal- and vertical-span features

3. Data preparation

The whole image is divided into a number of equal horizontal and vertical bars. The number of bars in each dimension is taken, in our case, as 20 in the horizontal and 20 in the vertical directions. Hence, we are normalizing the number of features of the horizontal span (height of the digit) and the vertical span (the width of the digit) as some numerals are small like the zero while other numerals are large like digits 4 and 9. For each numeral we calculate the number of black pixels in the horizontal and vertical bars. To normalize these features we divide them by the total number of black pixels in the image. In this work, we used the above four types of features. Experimentally, as will be discussed in Section 5, we found that slicing the character at 51, using 7 slicing circles, taking 20 horizontal and 20 vertical segments gave the best recognition rates. This makes a total of 120 features. These features represent the digit as a whole.

The data were collected from writers using semitransparent paper over a tabular grid. The data were collected from 44 writers; each writer wrote 48

Fig. 3. The concentric circles used for calculating the distance span features of Arabic (Indian) digits 4 and 9.

Fig. 4. The horizontal and vertical slices of digits 4 and 9.

ARTICLE IN PRESS S. Mahmoud / Signal Processing 88 (2008) 844–857

samples of each digit (0–9), a total of 480 digits per writer. The database consists of 21,120 samples. The written pages are then scanned using a scanner with a resolution of 300 pixels per inch. Fig. 5 shows the data collected from one writer. The scanned document images are transformed into binary images (viz. black and white). The black pixels represent the text lines and are given a value of one, while the white pixels represent the background and

849

are given a value of zero. For each scanned page the horizontal histogram is computed. The resulting histogram has black and white regions. The black regions represent the text lines and white regions represent the spaces between the text lines. The locations of the black regions give the numeral lines limits. Using the numeral lines limits the numeral lines are extracted. For each line the vertical histogram is computed. The histogram will have

Fig. 5. Data collected from one writer. Each digit is written 48 times.

ARTICLE IN PRESS 850

S. Mahmoud / Signal Processing 88 (2008) 844–857

black and white regions. The Black regions represent the digits and white regions represent the spaces between the digits. The locations of the black regions are used to specify the location of each digit in the line. These digits were extracted keeping each digit in a separate ﬁle with the number of rows and number of columns in the ﬁrst line of the ﬁle. The remaining lines correspond to the character pixel rows where a black pixel is saved as a ‘1’ and a white pixel as a ‘0’. It is worth mentioning that the author will make these data freely available for interested researchers.

all the digits to a height of 60 pixels while maintaining the aspect ratio for each numeral. Hence, the width of the Arabic (Indian) numerals is different for the different digits. Since the samples of each digit are written on one line, we normalized the samples on the digit height. The writing line height in this case is dependent on the digit height. When numerals are used with text the normalization will be based on the text writing height. Fig. 6 shows samples of Arabic (Indian) digits before and after normalization. 4. Hidden Markov model (HMM)

3.1. Normalization The dimensions of each numeral, in terms of height and width, are different. In order to make the technique size invariant the data were normalized. Arabic (Indian) characters differ in height and width. Arabic writing has writing line height. The individual characters are normally segmented at this height. We used this height to represent the height of all characters in the line. Arabic text is cursive and hence characters are connected normally at the writing line. Maintaining the aspect ratio of height to width is important so that connecting the characters at the right position is maintained. Although Arabic (Indian) numerals are not cursive (they are isolated) we decided to follow the same normalizing technique used for Arabic characters. This enables us to integrate our numeral recognition technique with other ATR systems. We normalized

Fig. 6. Digits ‘0’, ‘4’, and ‘9’ in original (ﬁrst row) and normalized forms (second row).

Several research papers are published using HMM for text recognition [15,20,21,27–29]. In order to use HMMs several researchers computed the feature vectors as a function of an independent variable. This simulates the use of HMM in speech recognition where sliding frames/windows are used. The same technique is used in off-line text recognition where the independent variable is in the direction of the line length [20,21]. In this paper we have used different techniques to extract the features of an Arabic (Indian) numeral using the numeral as a whole and not a sliding window, that calculates the features based on partial parts of the character. However, we are using the same HMM classiﬁer without modiﬁcation. In this paper we used a left to right HMM for our Arabic (Indian) handwritten numeral recognition Fig. 7 shows the case of a 5-state HMM. This is in line with several research works using HMM [20,21]. This model allows relatively large variations in horizontal position of the Arabic (Indian) numeral. The sequence of state transition in the training and testing of the model is related to each digit feature observations. In this work we experimented with using different number of states and selected the best performing one. Although each digit model could have different number of states we decided to use the same number of states for all digits as was done in [20,21]. Each Arabic (Indian) numeral is represented by a 120-dimensional feature vector (viz. 72 angle-, 8

Fig. 7. A 5-state hidden Markov model (HMM).

ARTICLE IN PRESS S. Mahmoud / Signal Processing 88 (2008) 844–857

distance-, 20 horizontal-, and 20 vertical-span features). Each numeral requires a number of observations to train and test the HMM. So, we divided our feature vector to 12 separate sub vectors of 10 features each. Hence, each digit is represented by 12 observations of 10 features each. 5. Training and recognition In this paper we used two classiﬁers (viz. HMM and the nearest neighbor classiﬁer). Each of these classiﬁers requires training and testing phases. The training of these classiﬁers is different. The way data are presented to each classiﬁer in the training phase is different. We will address this issue separately for each classiﬁer. We experimented with each classiﬁer and analyzed the results of each classiﬁer separately and then compared the results of the two classiﬁers. In general, in the training phase the features of training data are computed and saved as models for the trained classes. In the recognition phase an unknown character features are extracted and compared with the features of the models. The unknown character is assigned to the class whose features are the closest (or the most probable) to the new character. The implementation of this work was done using C language and MATLAB. The HTK tools [30] are used in the experimentation of HMM. 5.1. Hidden Markov model classifier (HMMC) In order to apply this classiﬁer, we applied data randomization in the ﬁrst phase and classiﬁcation in the second phase. 5.1.1. Data randomization Since Arabic (Indian) digits are saved in separate ﬁles, in our database, it was necessary to represent Arabic (Indian) numbers as natural as possible. HMM computes the probabilities of each numeral and the probability of a digit appearing before and after other digits. HMM uses these probabilities in the training and recognition phases. In Arabic text numerals appear in equal probability and the digits have the same probability of appearing between two digits. Hence, it was necessary to randomize the presented data in the training of the HMM. The randomization is done in the length of the Arabic (Indian) number and on the appearance of the digits in a number. So the digits of an Arabic (Indian) number are combined in a random way and the

851

Arabic (Indian) number length was randomized having arbitrary length from 5 to 15. We made a utility tool to achieve this. This tool uses two random generators, one to decide the length of the number (i.e. the number of digits in a particular line) and the other random generator to decide which digit (from 0 to 9) will be the next digit in a particular number. We used this approach for all the writers in a sequence. We selected the ﬁrst writer; once all the samples for this writer are used, another writer is selected until all writers are exhausted. All these decisions were also recorded in the master label ﬁle to be used during the training and testing of the HMM. Overall we constructed 2171 numbers for use in training and testing of the HMM. Out of these 2171 numbers we used 1700 numbers for training (consisting of 16,657 digits) and the remaining 471 numbers for testing (consisting of 4463 digits). Thus in our case we have, in general, separate writers for training and testing. 5.1.2. Classification Using this classiﬁer we experimented with different number of features and different number of states. We used a slicing angle a of 51 (resulting in 72 features), 7 concentric circles (resulting in 8 features), and 20 horizontal and 20 vertical features; a total of 120 features are used. This feature vector was split into 12 observations of 10 features each. In order to ﬁnd the best number of states to use in the recognition and classiﬁcation stages several runs are conducted on the data with different number of states (viz. 3, 6, 8, 10, and 12). We trained the HMM with the observations of 1700 Arabic (Indian) numbers of varying length totaling 16,657 digits and tested the HMM with the observations of new 471 Arabic (Indian) numbers of varying length totaling 4463 digits. Fig. 8 shows a summary of the results of the tested characters (the recognition rate and accuracy) using 3, 6,8,10, and 12 states. It is clear from the ﬁgure that the recognition rate increases with the increase of the number states until reaching state 10 (98.28) then dropped on using 12 states (97.32) counting the silences. The accuracy is lower than the recognition rate because it takes account of the insertion errors that the latter ignores. The recognition rate ignoring silences for the 10 state case is 97.99%. Other experiments were run using a different number of features. In these experiments we used a span angle a of 61 (resulting in 60 features), 9 concentric span circles (resulting in 10 features), and

ARTICLE IN PRESS S. Mahmoud / Signal Processing 88 (2008) 844–857

852

Table 2 The confusion matrix for the 10 states using 12 observations of 10 features each 0

1

2

3

4

5

6

7

8

9

0 446 0 0 2 0 0 0 2 4 1 1 0 441 1 0 1 0 0 0 0 0 2 0 0 430 0 11 0 0 2 0 1 3 1 0 6 425 2 0 1 7 1 3 4 0 2 0 0 443 0 1 3 0 2 5 7 0 1 0 0 442 0 0 0 0 6 1 1 2 0 1 0 437 0 0 0 7 0 0 0 1 0 0 1 442 0 0 8 1 0 0 0 0 0 0 5 436 3 9 0 3 0 0 1 0 6 2 0 435

Fig. 8. The correct recognition rate and accuracy at states 3, 6, 8, 10,and 12 with 128 code book.

Table 1 The recognition rates and accuracy of different number of features using HMMs with 6, 10, and 12 states Used 72 Angle-, 8 distance-, 20 60 Angle-, 10 distance-, 20 features horizontal- and verticalhorizontal- and verticalspan features span features States

Recognition Accuracy rate

Recognition Accuracy rate

6 10 12

95.62 98.28 97.32

96.12 97.95 98.23

94.9 98.26 97.3

96.05 97.95 98.21

20 horizontal- and 20 vertical-span features. The total numbers of features used were 110. This feature vector was split into 11 observations of 10 features each. These experiments were run using an HMM with 6, 10, and 12 states. Table 1 shows the results of these experiments compared with the results of using 120 features as detailed in the previous paragraph. The confusion matrix for the 120 feature case is shown in Table 2. Here %c is percentage of recognition rate and %e is the error percentage. The average recognition rate is 97.99%, ignoring silences (98.28% including silences as produced by HMM). 5.2. The nearest neighbor classifier (NNC) We conducted several experiments using a different number of samples for training/modeling and

%c

%e

98 99.5 96.8 95.3 98.2 98.2 98.9 99.1 98 97.3

0.2 0 0.3 0.4 0.1 0.1 0.1 0.1 0.2 0.2

testing. In the training phase the feature vectors (V) of the training data are extracted (viz. 72 angle-, 8 distance-, 20 horizontal- and 20 vertical-span features). The features of each digit are averaged and the averaged features are used as the models of the Arabic (Indian) numerals. Since this classiﬁer does not use the probabilities of the occurrences of each digit and does not use the conditional probabilities of one digit coming before or after other digits, the data are simpler to present to the classier in the training phase as will be discussed below. In the testing phase the feature vector (V) for the unknown character is computed and then compared to the feature vectors of the model classes. The classiﬁcation decision is based on the nearest neighbor classiﬁcation method. The nearest distance is computed using a simple formula given by Ei ¼

k X

jM ij V i j,

(3)

j¼1

where Ei is the distance between the input digit and model i (i.e sum of the absolute differences between the features of the input digit and those of model i), k is the total number of parameters in the feature vector(i.e. 120), Mij is the jth feature of model i, and Vj is feature j of the input digit feature vector. The distance (Ei) between the new digit and all models’ feature vectors are found. The argument of the minimum value found (i.e. min(Ei)) yields the recognized model i. This model is considered as the class that matches most closely the obtained features vector of the unknown digit. Hence, the class of the digit is found.

ARTICLE IN PRESS S. Mahmoud / Signal Processing 88 (2008) 844–857

In our training and testing using the NNC, we used the same number of features that were used in HMM with highest recognition rate for comparison purposes. We experimented with different number of training and testing data. After several experimentations we got the best recognition rates with using 24 writers for training (a total of 11,520 samples) and tested the system with the remaining writers (i.e. 20 writers, a total of 9600 samples). The confusion matrix is given in Table 3 along with the recognition rate for each digit. The average recognition rate is 94.35%. Fig. 9 shows the recognition rates of the tested writers 25 to 44. It is clear that the recognition rates of some writers are over 98% while other writers little over 87%. This variation is normal with different writers as each writer may have different writing styles that is distant from the averaged

853

feature vectors of the models. This variation will be analyzed in more details later with the analysis of erroneous samples. Fig. 10 shows the recognition rates of the different digits using HMM and the NNC. It is clear from the ﬁgure that HMM outperforms the NNC, as expected. On the average HMM recognition rate is nearly 4% more than the NNC. In some cases the difference is 10% as is the case with digit 5. We analyzed the samples of the database that were misclassiﬁed. The reasons for the misclassiﬁcation errors may be attributed to the following main categories: (1) Errors due to bad or corrupted data. Fig. 11 shows samples of bad or corrupted data. The number of this type of data is 55 in the testing data that accounts for 0.57% of the average

Table 3 The confusion matrix of the least nearest neighbor classiﬁer and the recognition rates of each digit

0 1 2 3 4 5 6 7 8 9

0

1

2

3

4

5

6

7

8

9

% Recognition rate

% Error

944 1 12 7 3 94 1 17 5 6

6 938 2 0 27 1 3 0 0 2

2 0 878 1 19 19 11 0 10 1

1 0 1 886 2 0 0 2 0 9

0 6 39 2 901 0 0 1 2 1

2 0 6 0 0 846 0 0 0 0

0 0 0 0 0 0 939 1 0 0

1 0 2 32 0 0 2 938 78 4

4 0 9 8 8 0 0 0 859 8

0 15 11 24 0 0 4 1 6 929

98.33 97.71 91.46 92.29 93.85 88.13 97.81 97.71 89.48 96.77

1.67 2.29 8.54 7.71 6.15 11.88 2.19 2.29 10.52 3.23

Fig. 9. The recognition rates of the tested writers (25–44).

ARTICLE IN PRESS 854

S. Mahmoud / Signal Processing 88 (2008) 844–857

Fig. 10. Comparison of the recognition rates of the HMM and the nearest neighbor classiﬁers.

Fig. 11. Samples of badly written or corrupted data.

error rate (10% of the errors). This applies to both classiﬁers (viz. HMMC and NNC). (2) Errors due to deformed samples or samples with un-proportional segments in relation to other segments in length and orientation. Fig. 12 shows samples with this type of errors. There are 74 samples that are classiﬁed under this type of error. These samples account for 0.77% of the error rate (or 13.6% of the errors) of the NNC. This percentage is much less for the HMMC as HMM tolerates this type of errors.

(3) Errors due to samples written with different style than the training style. For example, digit three may be written with three upward segments or two upward segments ( ). This type of error can be addressed by allowing a digit to have more than one model. Each style of a digit, if it is appreciably different from the basic model, is represented by another model. There were 16 errors of this type for one writer alone. Fig. 13 shows samples of the writer style. This type of error was present in both classiﬁers.

ARTICLE IN PRESS S. Mahmoud / Signal Processing 88 (2008) 844–857

855

Fig. 14. Samples of digit 5 recognized as digit 0.

Fig. 12. Samples of error data due deformed or un-proportional segments.

Fig. 13. Samples of digit 3 written in different style than the style of the training samples.

(4) Errors related to digit pairs. Some digits are close in shape to other digits. Hence, if not written speciﬁcally they look similar to other digits. For example, if digit 5 is written in small size and the inside hole is small it is normally confused with zero. This case accounts for 94 errors (9.8% of digit 5 errors) of NNC. Fig. 14 shows samples of this category. Another example is the digit zero. It is normally a dot. Sometimes it is written like a small line or spreading pixels. When the digit is normalized it looks very similar to one. In such cases it is confused with digit 1. However, when digits are included with Arabic text then this problem is expected to disappear as the normalization will be based on the line height. Hence, digit zero will be very small compared with a one and hence will not be confused with it. Fig. 15 shows examples of this category. Both classiﬁers suffer from this type of error although HMMC is at much lesser extent. There are other digit pairs that are sometimes confused (viz. digit 7 with digit 8, and ‘2’ with ‘4’).

Fig. 15. Zeros recognized as one.

(5) Genuine errors that are misclassiﬁed with no visible reason and that can be attributed to insufﬁcient classiﬁcation capability of the used features and classiﬁers. It is not expected to get 100% recognition rate for writer-independent handwritten ofﬂine digits from any classiﬁer. It is to be noted here that a human may make a 1% misclassiﬁcation error on classifying the data in the database if the context is not present. 6. Conclusions This paper presented a system for independent writer off-line handwritten Arabic (Indian) numeral recognition based on estimating simple and effective features. In this work we used HMM and the NNC. We analyzed the performance of the HMM using different number of features and different number of states. We selected the number of features and states giving the highest recognition rate. These same features were used with the NNC. The technique is scale and translation invariant. The experimental results indicate the effectiveness of the proposed technique in the automatic recognition of off-line Arabic (Indian) handwritten numerals. A database of 21,120 digits was used in training and testing the classiﬁers. In the HMM, 1700 Arabic (Indian) numbers of varying lengths (totaling 16,657 digits) were used in training the HMM and 471 numbers of varying length totaling 4463 digits were used in testing the

ARTICLE IN PRESS 856

S. Mahmoud / Signal Processing 88 (2008) 844–857

HMM. Angular-, distance-, horizontal-, and vertical-span features were used. Several experiments were conducted to achieve the best recognition rate by using different number of states in the model and by modifying the number of used features. An average recognition rate of 97.99 was achieved using 120 features presented as 12 observations of 10 features per digit. An HMM of 10 states was used. Randomization of presenting the observations was necessary in the training of the HMM. Randomization was applied in the length of Arabic (Indian) numbers and in the used digits in each number. The same database and features were used for training and testing the NNC. Samples of the ﬁrst 24 writers (totaling 11,520 digits) were used in the training and the last 20 writers (25–44 totaling 9600 samples) were used in the testing phase. The average recognition rate achieved was 94.35%. The researchers are currently exploring the use of more statistical and syntactical features. The same technique will be applied to Arabic text recognition. The author is investigating the use of super vector machines (SVM) and neural networks for Arabic text/numeral recognition. In addition, the use of multiple classiﬁers will be explored. Acknowledgments First I would like to thank the referees for their constructive criticism and stimulating remarks. The modiﬁcation of the original manuscript to address those remarks improved the revised manuscript considerably. In addition, I would like to thank King Fahd University of Petroleum and Minerals for supporting this research work and providing the computing facilities.

References [1] Badr Al-Badr, Sabri A. Mahmoud, Survey and bibliography of Arabic optical text recognition, J. Signal Process. 41 (1) (January 1995) 49–77. [2] J. Mantas, An overview of character recognition methodologies, Pattern Recogn. 19 (6) (1986) 425–430. [3] V.K. Govindan, A.P. Shivaprasad, Character recognition— a review, Pattern Recogn. 23 (7) (1990) 671–683. [4] C.L. Liu, K. Nakashima, H. Sako, H. Fujisawa, Handwritten digit recognition: benchmarking of state-of the-art techniques, Pattern Recogn. 36 (2003) 2271–2285. [5] M. Shi, Y. Fujisawa, T. Wakabayashi, F. Kimura, Handwritten numeral recognition using gradient and curvature of gray scale image, Pattern Recogn. 35 (2002) 2051–2059.

[6] L.N. Teow, K.F. Loe, Robust vision-based features and classiﬁcation schemes for off-line handwritten digit recognition, Pattern Recogn. 35 (2002) 2355–2364. [7] K. Cheung, D. Yeung, R.T. Chin, A Bayesian framework for deformable pattern recognition with application to handwritten character recognition, IEEE Trans. Pattern Anal. Mach. Intell. 29 (12) (1998) 1382–1388. [8] I.J. Tsang, I.R. Tsang, D.V. Dyck, Handwritten character recognition based on moment features derived from image partition, Int. Conf. Image Process. 2 (1998) 939–942. [9] F. Al-Omari, Hand-written Indian numeral recognition systems using template matching approaches, Proc. ACS/ IEEE Int. Conf. Comput. Syst. Appl. (2001) 83–88. [10] F.A. Al-Omari, O. Al-Jarrah, Handwritten Indian numerals recognition system using probabilistic neural networks, Adv. Eng. Inform. 18 (2004) 9–16. [11] F. Bousalma, Structural and fuzzy techniques in the recognition of online Arabic characters, Int. J. Pattern Recogn Artif. Intell. 13 (7) (1999) 1027–1040. [12] A. Salah, E. Albaydin, L. Akarun, A selective attentionbased method for visual pattern recognition with application to handwritten digit recognition and face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 24 (3) (2002) 420–425. [13] A. Hamid, R. Haraty, A neuro-heuristic approach for segmenting handwritten Arabic text, Proc. ACS/IEEE Int. Conf. Comput. Syst. Appl. 110 (3) (2001). [14] S. Saloum, Arabic hand-written text recognition, Proc. ACS/ IEEE Int. Conf. Comput. Syst. Appl. (2001) 106–109. [15] S. Almaadeed, C. Higgens, D. Elliman, Recognition of off-line handwritten Arabic words using hidden Markov model approach, ICPR 2002, Quebec City, August 2002, pp. 481–484. [16] S. Almaadeed, C. Higgins, D.G. Elliman, Off-line recognition of handwritten Arabic words using multiple hidden Markov models, Knowledge Based Syst. 17 (2004) 75–79. [17] S. Touj, N.B. Amara, H. Amiri, Arabic handwritten words recognition based on a planar hidden Markov model, Int. Arab J. Inf. Technol. 2 (4) (2005) 318–325. [18] M. Khorsheed, Off-line Arabic character recognition—a review, Pattern Anal. Appl. 5 (2002) 31–45. [19] L.M. Lorigo, V. Govindaraju, Ofﬂine Arabic handwriting recognition: a survey, EEE Trans. Pattern Anal. Mach. Intell. 28 (5) (May 2006) 712–724. [20] I. Bazzi, C. LaPre, J. Makhoul, R. Schwartz, Omnifont and unlimited vocabulary OCR for English and Arabic, in: Proceedings of the International Conference on Document Analysis and Recognition, vol. 2, Ulm, Germany, 1997, pp. 842–846. [21] I. Bazzi, R. Schwartz, J. Makhoul, An Omifont openvacabulary OCR system for English and Arabic, IEEE Trans. PAMI 21 (6) (1999) 495–504. [22] M.H.S. Shahrezea, K. Faez, A. Khotanzad, Recognition of handwritten Persian/Arabic numerals by shadow coding and an edited probabilistic neural network, Proc. Int. Conf. Image Process. 3 (1995) 436–439. [23] H.M.M. Hosseini, A. Bouzerdoum, A combined method for Persian and Arabic handwritten digit recognition. in: Proceedings of the Australian New Zealand Conference, on Intelligent Information Systems. 1996, pp. 80–83. [24] F.N. Said, R.A. Yacoub, C.Y. Suen, Recognition of English and Arabic numerals using a dynamic number of hidden neurons. in: Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1999, pp. 237–240.

ARTICLE IN PRESS S. Mahmoud / Signal Processing 88 (2008) 844–857 [25] J. Sadri, C.Y. Suen, T.D. Bui, Application of support vector machines for recognition of handwritten Arabic/Persian digits. in: Proceedings of Second Iranian Conference on Machine Vision and Image Processing, vol. 1, 2003, pp. 300–307. [26] H. Soltanzadeh, M. Rahmati, Recognition of Persian handwritten digits using image proﬁles of multiple orientations, Pattern Recogn. Lett. 25 (2004) 1569–1576. [27] M. Mohamed, P. Gader, Handwritten word recognition using segmentation-free hidden Markov modeling and

857

segmentation-based dynamic programming techniques, IEEE Trans. Pattern Anal. Mach. Intell. 18 (5) (May 1996) 548–554. [28] J. Hu, S.G. Lim, Michael K. Brown, Writer independent online handwriting recognition using an HMM approach, Pattern Recogn. 33 (2000) 133–147. [29] A.H. Hassin, X. Tang, J. Liu, W. Zhao, Printed Arabic character recognition using HMM, J. Comput. Sci. Technol. 19 (4) (July 2004) 538–543. [30] HTK Speech Recognition Toolkit, /htk.eng.cam.ac.uk/S.

Recommend Documents

Hidden Markov Models

Predicting protein structure using hidden Markov models