A Hybrid RNN Model for Cursive Offline Handwriting Recognition

Report 0 Downloads 148 Views
2012 Brazilian Symposium on Neural Networks

A Hybrid RNN Model for Cursive Offline Handwriting Recognition Byron Leite Dantas Bezerra

Cleber Zanchettin

Vin´ıcius Braga de Andrade

University of Pernambuco Polytechnic School of Pernambuco 50.750-470, Recife - PE, Brazil Email: [email protected]

Federal University of Pernambuco Center of Informatics 50.732-970, Recife - PE, Brazil Email: [email protected]

University of Pernambuco Polytechnic School of Pernambuco 50.750-470, Recife - PE, Brazil Email: [email protected]

Abstract—This paper presents an approach to handwriting character recognition using recurrent neural networks. The method Multi-dimensional Recurrent Neural Network is evaluated against the classical techniques. To improve the model performance we propose the use of specialized Support Vector Machine combined with the original MDRNN in cases of confusion letters to avoid misclassifications. The performance of the method is verified in the C-Cube database and compared with different classifiers. The hierarchical combination presented promising results.

about discriminatory regions of specific letters (e.g. U and V, Q and O, I and l, among others), like specialized feature extraction techniques perform to avoid misclassifications. In this paper, we evaluate the performance of the original MDRNN in handwriting character recognition and propose the use of specialized Support Vector Machine (SVM) [25] to improve the performance of the MDRNN in cases of confusion letters. The performance of the method is verified in the CCube database and compared with different classifiers using the same database. In Section II is presented the problem and related works. In Section III we present our proposed method, based on the MDRNN-LSTM approach and its combination with SVM. In Section IV experiments are presented. The conclusion is given in the final section.

I. I NTRODUCTION Despite more than 30 years of handwriting recognition research [20], [16], [26], [5], developing a reliable, generalpurpose system for unconstrained text line recognition remains an open problem. This is a complex task due to variations of existing styles in handwriting, noise in acquisition process and similarity among some classes [28]. In this domain each writer has a different calligraphy and it my change depending of the writing material (type of pen, pencil or paper). The emotional state of the person, the paper space available and time to write may yet have some influence on the handwriting text. In the classical literature the classifiers used to perform handwriting recognition are statistical, connectionist and probability based classifiers [5]. Recently the Recurrent Neural Networks (RNN) presented promising results in this field [1]. The architecture proposed by Graves et al. [11] obtained the best results in The ICDAR 2009 Handwriting Recognition Competition [1]. This model is composed by a hierarchy of Multi-dimensional Recurrent Neural Network (MDRNN) [13] layers that uses the Long Short Term Memory method [12] and connectionist temporal classification (CTC) output layer to character and word recognition. The proposed model is an offline recognition system that would work on raw image pixels. As well as being alphabet independent, such a system would have the advantage of being globally trainable, with the image features optimized along with the classifier. As the model does not need a explicit feature extraction procedure from the digitalized handwriting image, despite simplify the recognition process, this property may cause misclassification in the similar letters. This confusion can happen because the method don’t have specific information 1522-4899/12 $26.00 © 2012 IEEE DOI 10.1109/SBRN.2012.41

II. H ANDWRITTEN C HARACTER R ECOGNITION The handwriting recognition task is traditionally divided into online and offline recognition. In online recognition a time series of coordinates, representing the movement of the pen-tip, is captured, while in the offline case only an image of the text is available. Because of the greater ease of extracting relevant features, online recognition generally yields better results [20]. Another important distinction among recognizing isolated characters or words, and recognizing whole lines of text. Lastly, handwriting recognition can be split into cases where the writing style is constrained in some way, for more challenging scenario where it is unconstrained. Character recognition consists of recognizing a set of characters from an image, separating them into 10 classes, in the case of digits, or 26 classes, in the case of the Western alphabet letters. There are some problems that hinder the implementation of character recognition. In some cases the scanned image is of low quality, thus is necessary to perform a preprocessing to eliminate image noises. Another problem is the existence of distorted characters, especially when dealing with handwritten documents, due to the characteristics of the writer’s style. Moreover, another difficulty to consider is the similarity between some characters, such as I and J, Q and O, U and V, among other. In Figure 1 is presented a sample of characters with similar characteristics. 113

To build an efficient handwritten recognition system, it is important to choose correctly the extracted features to be used. In Rodrigues et al. [17] is performed a study to evaluate a set of feature extraction techniques for handwritten letters recognition. This work presented a technique based on the projection of the image outline on the sides of a regular polygon built around each character. The feature vector is formed by the perpendicular distances taken from each side of the polygon to the contour of the image. In the evaluation process the proposed approach is compared using two polygons (square and hexagon) and two different amounts of projection lines taken from each side of the polygon, with two versions of coding bit maps (standard and tuned). The discriminatory power of each case is examined through the use of a Multi-Layer-Perceptron neural network (MLP).

Fig. 1.

case version of a certain letter can be joined in a single class or not. Once this is done for every letter, SVMs performs the character recognition. They compare notably better, in terms of recognition rates, with popular neural classifiers, such as Learning Vector Quantization (LVQ) and MLP. The SVM recognition rate is among the highest presented in the literature for cursive character recognition. In Neves et. al. [15] is presented a hierarchical combination of SVM to handwriting digit recognition with promising classification results but a high classification time due to the necessity of use a model of SVM to each class pair. Zanchettin et al. [28] presents a hybrid KNN-SVM method for cursive character recognition. Specialized SVMs are introduced to significantly improve the performance of KNN in handwrite recognition. This hybrid approach is based on the observation that when using KNN in the task of handwritten characters recognition, the correct class is almost always one of the two nearest neighbors of the KNN and SVM is used to reduce misclassifications among the two nearest neighbors. In [18] and [24] it was observed similarities between some letters (e.g. ’B and D’, ’H and N’, ’O and Q’). In [28] it was observed similarities between some letters (e.g. ’O and D’, ’B and R’, ’D and B’, ’N and M’). In [8] was detected a high error rate in characters that have two completely different ways of writing (e.g., ’a and A’, ’f and F’). In this paper we perform experiments with the different ways of writing such as upper, lower and joint case.

Characters with structural similarity.

Vamvakas et al. [24] proposed a methodology based on a new feature extraction technique based on recursive subdivision of the character image so that the result of sub-images in each iteration has a balanced number (approximately equal) of pixels in the foreground, as far as it is possible. In the experiments two databases of handwritten characters (CEDAR and LIC) and two databases of handwritten digits were used (MNIST and CEDAR). The classification step was performed using SVM with Radial Basis Function (RBF). Cruz et al. [8] presented a new approach for recognizing cursive characters using multiple feature extraction algorithms and an ensemble classifier. Several feature extraction techniques, using different approaches, were evaluated. Two techniques, Modified Edge and Multi Zoning Maps, were proposed. Based on the results, a combination of feature sets was proposed in order to achieve high recognition performance. This combination was motivated by the observation that the sets of characteristics are independent and complementary. The ensemble was conducted by combining the outputs generated by the classifier in each set of features separately. The CCube database was used for the experiments and the classifier a three-layer MLP network, trained with the Resilient Backpropagation. Bellili et al. [2] proposed an hybrid MLP-SVM method for unconstrained handwritten digits recognition. This hybrid architecture is based on the idea that the correct digit class almost systematically belongs to the two maximum MLP outputs and that some pairs of digit classes constitute the majority of MLP misclassifications. Specialized local SVMs are introduced to detect the correct class among these two classification hypotheses. Camastra [6] presented a cursive character recognizer that performs the character classification using SVM and neural gas. The neural gas is used to verify whether lower and upper

III. MDRNN M ODEL BOOSTED BY SVM APPLIED TO CHARACTER RECOGNITION PROBLEM

The hierarchical structure proposed by Graves et al.[11] is composed by a multidimensional recurrent neural networks (MDRNN) using the multidimensional LSTM. The output layer use the connectionist temporal classification (CTC) [14]. The concept of MDRNNs [13] is to replace the single recurrent connection found in standard recurrent networks with as many connections as there are in the spatio-temporal dimensions of the data. These connections allow the network to create a flexible internal representation of surrounding context, which is robust to localized distortions. During the forward pass, at each point in the data sequence, the hidden layer of the network receives both an external input and its own activations from one step back along all dimensions. Figure 2 illustrates the two dimensional case.

Fig. 2.

2D RNN Forward (a) and Backward pass (b) [12].

The forward pass of an MDRNN can then be carried out by feeding forward the input and the n previous hidden layer activations at each point in the ordered input sequence, and storing the resulting hidden layer activations. Care must be taken at the sequence boundaries not to feed forward activations from points outside the sequence.

114

This architecture allows cyclic connections among the network nodes. These connections enable the network saves the entire history of previous inputs and for each output (without this characteristic - cyclic connections) only mapping the input to the output class. The key issue is the recurrent connections acts as a ’memory’ of previous inputs to persist in the network’s internal state, which can be used to influence the network output [12]. Figure 3 illustrates the ordering for a 2 dimensional sequence.

Fig. 5.

To work with multiple dimensions the authors suggested to use a multi-dimensional network expanding the concept above presented. The basic idea of the MDRNNs is to replace the single recurrent connection found in standard RNNs with as many recurrent connections as there are dimensions in the data. During the forward pass, at each point in the data sequence, the hidden layer of the network receives both an external input and its own activations from one step back along all dimensions. Therefore, in the character recognition problem, it is natural to use two recurrent connections. The MDRNN and LSTM activation output is presented to the CTC output layer designed for sequence labeling with RNNs. Unlike other neural network output layers, it does not require pre-segmented training data, or postprocessing to transform its outputs into transcriptions. Instead, it trains the network to directly estimate the conditional probabilities of the possible class given the input sequences.

Fig. 3. Sequence ordering of 2D data. The MDRNN forward pass starts at the origin and follows the direction of the arrows. The point (i,j) is never reached before both (i-1,j) and (i,j-1) [12].

In Figure 4(a) is presented a recurrent neural network (RNN) with one hidden-layer. Unfortunately, for standard RNN architectures, the range of context accessible is limited. The problem is that the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network’s recurrent connections. This behavior results in what is generally called the Vanishing Gradient Problem, illustrated in Figure III. In this figure the luminance of the nodes indicate the sensitivity over time that previous information of an input can influence in the information of the next input. The sensitivity decays exponentially over time as new inputs overwrite the activation of hidden units and the network ’forgets’ the first input.

(a) RNN with one hidden layer Fig. 4.

Vanishing Gradient Problem [12].

(b) LSTM Unit with one memory cell

Illustration of the MDRNN model [12]. Fig. 6. How the information can be persisted in LSTM, consider ’O’, for a open gate and ’-’ for a closed gate.

To solve this problem Graves et al. [11] proposed the use of the Long Short-Term Memory (LSTM) method. The LSTM consists of a recurrent subnetwork that is known as memory block. Each block contains one or more self-connected memory cells and three multiplicative units: the input, output and forget gates. The multiplicative gates allow LSTM memory cells to store and access information over long periods of time, thereby avoiding the Vanishing Gradient Problem. A LSTM unit is illustrated in Figure 4(b). In Figure III is presented how the LSTM work over time with the next network inputs.

Although the robustness and complexity of the MDRNNLSTM model, we observed some misclassifications when the correct character class or word have significant similarities. In some cases still is possible to distinct the classes without using of context information. The reason this happens is, at the same time, one of the most interesting advantages of the MDRNN-LSTM model: it does not have to run the feature extraction step, which is generally needed in all statistical

115

pattern recognition methods grounded on the Bayes Decision Theory (BDT) [9]. The BDT is based on quantifying the tradeoffs among various classification decisions using probability and the costs that accompany such decisions. It makes the assumption that the decision problem is posed in probabilistic terms, and that all of the relevant probability values are known. According to the Bayes desision rule, for statistically independent variables, such as the character recognition problem, the class of a given observed feature vector must be chosen uniquely based on the likelihood of this feature vector concerning the class, times the probability of this class. Therefore, suppose we are able to define discriminant functions which take as input well defined features extracted from the image of some character and, as output, compute the sample’s probability to belong to the respective class associated with the discriminant function. Thus, the class associated with the discriminant function which has produced the highest output, is the chosen class. If we adopt the previous approach for each class where the MDRNN-LSTM model have more misclassifications, we maximize the chance of deciding the correct class based on this composed model, since it is specialized on the representative features extracted from the data. Additionally, take into account the majority of confusion observed in the MDRNNLSTM model belongs to some pair of classes (e.g., ’U’ and ’V’, ’I’ and ’l’, among other), we just need a dichotomizer [9] for each pair of classes misclassified by the MDRNN-LSTM method. In other to determine the pair of classes we need to run the respective dichotomizer, we select the two classes which receives the highest values (the most probable) in the MDRNN output layer.

(a) The SVM optimal margin separation Fig. 7.

the proposed system, the feature extraction step is done only when the MDRNN-LSTM return as output a class with low confidence and this class is generally confounded with another one. In this case, we take the second most probable class returned from the MDRNN-LSTM and use the appropriate SVM to solve the confusion between the options. In experiments we used the 34 features proposed by Camastra [6].

Fig. 8.

The proposed model architecture.

Different SVMs were derived from pairs of the classes (e.g. (U, V), (m, n), (N, n), etc.) constituting the majority of the confusions observed with the MDRNN-LSTM classifier. Different kernel functions (linear, polynomial and RBF) were tested and the best performances were obtained by trained SVMs with the RBF kernel function. The choice of pairs of classes with confusion was based on the amount of errors taking as minimum 10% size of the training set. In Figure 9 is presented the confusion matrix of the joint case. In this table the underlined numbers represents the KNN pairs. The pairs for the upper and lower cases follow a very similar structure. IV. E XPERIMENTS A. C-Cube Database The C-Cube is a public database available for download on the Cursive Character Challenge website (http://ccc.idiap.ch). The database consists of 57,293 files, including uppercase and lowercase letters, manually extracted from the CEDAR and United States Post Service (USPS) databases. All images are binary and with variable size. The data are unbalanced and there is a big difference in the number of pattern among the letters. There are several feature extraction techniques proposed in literature to character recognition and it’s an important factor to achieve high accuracy rates [23]. Cruz et al. [8] performed experiments with different feature extraction techniques with this database. Camastra [6] used a clustering analysis to verify whether the upper and lower case versions of the same letters are similar in shape. The letters (c, x, o, w, y, z, m, k, j, u, n, f, v) presented the highest similarity between the two versions and were joined into a single class in experiments without lost of generality. The classification results for the split and joined cases are shown in Table I. The results in Table represents the average performance of the model in 30 runs of the model in different data distribution. As can be see in this table, the edge maps algorithm presented the overall best result. Most feature sets presented better accuracy for the upper case letters with the exception of the method proposed by Camastra that performed better for lower case. This feature set also presented the best accuracy (84.37%) for the lower case. The last two lines of Table I present yet the results

(b) Features in (c) kernel function a bi-dimensional space Illustration of SVM operation [19].

The Support Vector Machine (SVM) [25] is a binary classification technique and also a dichotomizer. We propose the use of the SVM as a class confirmation for the MDRNN-LSTM method. The SVM training consists of finding the support vectors for each class and creating a function that represents an optimal margin separation between the support vectors of different classes. Consequently, it is possible to obtain an optimal hyperplane to class separation as shown in Figure 7(a). Therefore, the SVM find the optimal linear separation function among two classes and still deal with non-linear separable data (see Figure 7(b)). The SVM uses a kernel function to increase the feature dimensionality and consequently turning the data linearly separable as is shown in Figure 7(c). The proposed system architecture is shown in Figure 8. In

116

Fig. 9.

Joint case confusion matrix.

estimates the membership probability that its input for each of its temporal groups. This information is propagated up to the root, which outputs the most probable character classification.

of the MDRNN and MDRNN+SVM in the split and joint database. The MDRNN presented promising results, specially in comparison with methods where the feature extraction step is needed. Additionally, our proposed hybrid model overcomes the MDRNN model. In fact, our proposed model achieves the best rates in the upper and joint databases and is statistically equivalent with those from Camastra method in case of the lower letters database. Method Edge Binary Grad. MAT Grad. Median Grad. Camastra 34D Zoning Structural Concavities Projections MDRNN MDRNN+SVM

Nodes 490 490 300 360 400 450 320 530 500 -

Upper Case (%) 86.52 86.35 85.77 85.10 79.63 84.46 81.94 73.35 71.73 89.51 90.45

Lower Case (%) 81.13 79.89 79.22 79.48 84.37 78.07 77.70 81.89 79.90 82.98 84.10

Algorithm HVQ-32 [21] HVQ-16 [21] MDF-RBF [22] 34D-RBF [22] MDF-SVM [22] 34D-SVM+Neu. Gas [6] 34-MLP [6] MLP+SVM [27] KNN+SVM [28] MDRNN MDRNN+SVM

Joint Case (%) 82.49 81.46 80.83 79.96 79.97 78.60 77.07 74.90 73.85 84.15 84.44

#Class 52 52 52 52 52 52 52 52 52 52 52

Recog. Rate (%) 84.72 85.58 80.92 84.27 83.60 86.20 71.42 82.53 83.76 84.15 84.44

TABLE II R ECOGNITION RATES FOR THE C-C UBE DATABASE J OINT C ASE

In [22] the modified direction feature extraction technique combines the use of direction features (DFs) [3] and transition features (TFs) [10] to produce recognition rates that are generally better than either DFs or TFs used individually. Using this information direction transitions (DT) equal to the corresponding direction feature divided by 10 are extracted for each row (left to right and right to left) and each column (top to bottom and bottom to top). In addition, any contiguous set of equal value direction features is replaced by a single value. Location transitions (LTs) are similarly calculated for each row and each column in both directions, with the relative start positions of each direction feature calculated as a proportion of the total width (in the case of a row) or height (in the case of a column). Given the initial set of LT and DT values corresponding to the actual number of rows and columns in the original character bitmap, the data is normalised and locally averaged to fit into a space of 5 rows and 5 columns producing a final vector of 120 features [4]. Camastra [6] presented a cursive character recognizer. The character classification is achieved by using support vector machines (SVMs) and a neural gas. The neural gas is used to verify whether lower and upper case version of a certain letter can be joined in a single class or not. Once this is done for every letter, the character recognition is performed by SVMs.

TABLE I R ECOGNITION RATE BY FEATURE SET FOR THE UPPER AND LOWER CASE SEPARATED [17]

The best results obtained in recent years for C-Cube database are displayed in Table II. In Thornton et al. [21] the HVQ with temporal pooling algorithm is a partial implementation of Hierarchical Temporal Memory (HTM). This biologically-inspired model places emphasis on the temporal aspect of pattern recognition, and consequently parses all images as ’movies’. The hierarchy itself is a full 4 level tree of degree 4 that processes a 32 × 32 pixel input character image. During training, each node receives input from the layer below, with leaf nodes receiving a 4 ×4 raw pixel image that is moved one pixel at a time across the node’s receptive field, in a process known as sweeping. As the sweep progresses, we count how frequently one pattern follows another. This information is used to create temporal groups that collect together patterns that have most frequently succeeded another during training. The same process of temporal pooling is repeated at each level up to the root node, where images are classified according to their character values. During recognition, an image is again swept across the leaf node sensors, as each non-root node

117

A method for increasing the recognition rates of handwritten characters by combining MLP and SVM was presented in [27]. The experiments demonstrated that the combination of MLPs networks with SVMs experts pairs of classes that constitute the greatest confusion of MLP, had improved performance in terms of recognition rate. In Zanchettin et al. [28] a combination of KNN and SVM is presented. The main idea is to use the SVM to increase the kNN recognition rate as a decision maker classifier. The adaptation in this case is to take the two most frequent classes in the k nearest neighbors and to use the SVM to decide between these two classes. It is a satisfactory technique to be used where a misclassification results in high costs. This technique, however, depends on the kNN method, its main disadvantage is the processing time. According the results presented in Table II we conclude the MDRNN model and our hybrid proposed model are in the top list methods for cursive character recognition. One advantage to use these methods against the other is the fact that MDRNN itself design and learn everything that is needed from the pixels of the image to distinguish the main differences of the classes. Therefore, the training step is much easier than other methods and the classification step performs faster, even in the MDRNN-SVM proposed model, since the feature extraction and the classification step with SVM occur only in case of doubt of the MDRNN output.

[4] Blumenstein, M., Liu, X.Y., Verma, B.: An Investigation of the Modified Direction Feature Vector for Cursive Character Recognition. Pattern Recognition 40(2):376–388, (2007). [5] Bunke, H.: Recognition of cursive roman handwriting - past present and future. In: Proc. 7th Int. Conf. on Document Analysis and Recognition, vol. 1, pp. 448–459, (2003). [6] Camastra, F.: A SVM-Based Cursive Character Recognizer. Pattern Recognition, 40(12):3721–3727, (2007). [7] Camastra, F., Spinetti, M., Vinciarelli, A.: Offine Cursive Character Challenge: A New Benchmark for Machine Learning and Pattern Recognition Algorithms. In: Proc. Int. Con. on Pattern Recognition, pp. 913–916, (2006). [8] Cruz, R.M.O., Cavalcanti, G.D.C., Tsang, I.R.: An Ensemble Classifier for Offline Cursive Character Recognition using Multiple Feature Extraction Techniques. In: IEEE Int. Joint Conf. on Neural Networks, pp. 744–751, (2010). [9] Duda, R. O., Hart, P. E., Stork, D. G.: Pattern Classification. John Wiley and Sons, (2001). [10] Gader, P.D., Mohamed, M., Chiang, J.H.: Handwritten Word Recognition with Character and Inter-character Neural Networks. In: IEEE Trans. Syst. Man Cybernet.-part B: Cybernetics (27):158–164, (1997). [11] Graves, A., Fern´andez, S., Schmidhuber, J.: Multidimensional Recurrent Neural Networks. In: Proc. of Int. Con. on Artificial Neural Networks, pp. 549–558, (2007). [12] Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Dissertation, Technische Universit¨at Mnchen, M¨unchen, (2008). [13] Graves, A., Schmidhuber, J.: Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. In: Adv. in Neural Information Proc. Syst., pp. 545–552, (2009). [14] Hochreiter, S., Schmidhuber, J.:Long Short-Term Memory. Neural Computation, 9(8):1735–1780, (1997). [15] Neves, R.F.P., Lopes, A.N.G., Mello, C.A.B., Zanchettin, C.: A SVM Based Off-line Handwritten Digit Recognizer. In: IEEE Int. Conf. on Systems, Man, and Cybernetics (SMC), pp. 510–515, (2011). [16] Plamondon, R., Srihari, S.N.: On-line and Off-line Handwriting Recognition: A Comprehensive Survey. In: IEEE Trans. Pattern Anal. Mach. Intell., 22(1):63–84, (2000). [17] Rodrigues, R. J., Kupac, G. V., Thom´e, A.C.G.: Character Feature Extraction using Polygonal Projection Sweep (Contour Detection). In: Proc. Int. Work Conf. on Artificial Neural Networks, pp. 687–695, (2001). [18] Rodrigues, R. J., Silva, E., Thom´e, A.C.G.: Feature Extraction using Contour Projection. In: Proc. World Multiconference on Systemics, Cybernetics and Informatics, 2001. [19] Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall Pearson Education Inc., (2003). [20] Tappert, C., Suen, C., Wakahara, T.: The State of the Art in Online Handwriting Recognition. In: IEEE Trans. on Patt. Analysis and Machine Intelligence, 12(8):787–808, (1990). [21] Thornton, J., Faichney, J., Blumenstein, M., Hine, T.: Character Recognition using Hierarchical Vector Quantization and Temporal Pooling. In: Proc. Australasian Joint Con. on Artificial Intelligence, pp. 562–572, (2008). [22] Thornton, T., Blumenstein, M., Nguyen, V., Hine, T.: Offline Cursive Character Recognition: A State-of-the-art Comparison. In: Conf. Int. Graphonomics Society, (2009). [23] Trier, O.D., Jains, A.K., Taxt, T.: Feature Extraction Methods for Character Recognition - A Survey. Pattern Recognition, 29(4):641–662, (1996). [24] Vamvakas, G., Gatos, B., Perantonis, S.J.: Handwritten Character Recognition Through Two-stage Foreground Sub-sampling. Pattern Recognition (43):2807–2816, (2010). [25] Vapnik, V. N.:Statistical Learning Theory. John Wiley and Sons, New York, USA, (1998). [26] Vinciarelli, A.: A Survey on Off-line Cursive Script Recognition. In: Pattern Recognition, 35(7):1433–1446, (2002). [27] Washington, W.A., Zanchettin, C.: A MLP-SVM Hybrid Model for Cursive Handwriting Recognition. In: Proc. of Int. Joint Conf. on Neural Networks, pp. 843–850, (2011). [28] Zanchettin, C., Bezerra, B.L.D., Azevedo, W.W.: A KNN-SVM Hybrid Model for Cursive Handwriting Recognition. In: IEEE Int. Joint Con. on Neural Networks, Birsbane, (in press) (2012).

V. F INAL R EMARKS This paper evaluated the performance of the original MDRNN recurrent neural networks in handwriting character recognition in a well-known benchmark, the C-Cube database. Additionally, it is proposed the use of specialized Support Vector Machine to improve the performance of the MDRNN in a hierarchical way. The performance of the method is verified and compared with different classifiers using the CCube database. The method presented promising results in the classification task and the proposed combination improve the method performance and robustness, specially in the disjoint Upper and Lower letters databases. As future work we suggest to evaluate the performance of the MDRNN and also the proposed method against other in some benchmark of isolated word images, varying the sample training quantity, the number of classes in the dataset, the amount of noise in the words, the resolution of the images, and other variables. ACKNOWLEDGMENT This work was supported by FACEPE (Brazilian Research Agency). R EFERENCES [1] El Abed, H., Margner, V., Kherallah, M., Alimi, A.M.: ICDAR 2009 Handwriting Recognition Competition. In: Int. Conf. Document Analysis and Recognition, pp.1388–1392, (2009). [2] Bellili A., Gilloux M., and Gallinari P.: An Hybrid MLP-SVM Handwritten Digit Recognizer. In: Int. Conf. on Document Analysis and Recognition. pp. 28–32, (2001). [3] Blumenstein, M., Verma, B., Basli, H.: A Novel Feature Extraction Technique for the Recognition of Segmented Handwritten Characters. In: Proc. Int. Conf. On Document Analysis and Recognition, pp. 137–141, (2003).

118