Self-Supervised Writer Adaptation Using ... - Semantic Scholar

Report 3 Downloads 89 Views
Self-supervised writer adaptation using perceptive concepts : Application to on-line text recognition Loïc Oudot

Lionel Prevost

Alvaro Moises

Maurice Milgram

Université Pierre & Marie Curie Laboratoire des Instruments & Systèmes d’Ile de France Groupe Perception, Automatique & Réseaux Connexionnistes 4 Place Jussieu, Paris Cedex 75252, Case 164 [email protected]

Abstract We recently designed a hand-printed text recognizer. The system is based on three set of experts respectively used to segment, classify and validate the text (with a French lexicon : 200K words). We present in this communication writer adaptation methods. The first is supervised by the user. The others are self-supervised strategies which compare classification hypothesis with lexical hypothesis and modify consequently classifier parameters. The last method increases the system accuracy and the classification speed. Experiments are presented on a large database of 90 texts (5400 words) written by 54 different writers and good recognition rates (82%) have been obtained.

1. Introduction Recently, hand-held devices like PDAs, mobiles phones or e-books have became very popular. In opposition to classical personal computers, they are very small, keyboard-less and mouse-less. Therefore, electronic pen is very attractive as pointing and handwriting device. The first application belongs to manmachine interface and the second to handwriting recognition. Here, we focus on the second one. For such an application, recognition rates should be very high otherwise it should discourage all the possible users. The major problem is the vast variation in personal writing style. This problem can be solved by updating the parameters of a writer independent recognizer to build a writer dependent recognizer by adapting the system to its user’ own writing style. We present now some related works : all these adaptation strategies are supervised by the writer so they always need writer attention. In [1], Hidden Markov Models are initially trained to be writer independent. Then, they are specialized using the writer personal database. Model’s parameters are re-estimated on these new data. In [2], new models are created if necessary, to fit better each writing style. In [3], a TDNN is trained on

a writer independent database, network outputs are classified using the k-nn rule. When the writer indicates an error, the corresponding output is stored with the k-nn references. In [7], 2 networks (MLP and RBF) perform writer independent classification. The RBF network is retrained, by adapting existing kernels, or creating new kernels. In [10], a k-nn classifier is initially trained by clustering. It is upgraded by activating new prototypes or inactivating useless prototypes. In this communication, we present several adaptation methods. The first one needs writer’ supervision. The others are self-supervised and the adaptation process is hidden to the writer. These methods compare classification hypothesis with lexical results to find hypothetical errors. These data are used to re-estimate the classifier parameters. In the following sections, our baseline recognition system and some investigated adaptation strategies are described. Results are given using a supervised adaptation and several self-supervised methods. Finally, conclusions and prospect are discussed.

2. Baseline recognition system The writing equipment consist of a sensitive screen attached to a PC. The text to recognized is represented by sequences of (x,y) coordinates of the pen moving. Our recognition system [5] consists of three subsets of experts (figure 1). ƒ The first set performs the pre-processings : grapheme segmentation and geometrical context analysis ; ƒ

The second set activates a list of candidate words , by combining the sub-lexical and geometrical hypothesis generated before ;

ƒ

At the third level, candidate words are validated using a French lexicon (187 000 words) as in the “activation-verification” model [6] suggested in perceptive psychology.

0-7695-2128-2/04 $20.00 (C) 2004 IEEE

Figure 1. Example of text & baseline system.

2.1 Pre-processing

2.4 Database and experimental results

Geometrical stroke properties are used to segment the text into graphemes. Lower and upper baselines are estimated to generate hypothesis on grapheme silhouettes (medium letters, ascenders, punctuation, accentuation) and word segmentation.

For our experiments, we use a large on-line text database, described in the following (figure 2). 54 users have written several texts (1 to 5) and up to 350 words (per user). All these texts have been labeled by a human expert. Results are presented on the whole database up to 5400 words and 26000 letters. The system needs the tuning of 4 free parameters. So, the database has been divided into two sets, respectively for tuning and testing. However, tuning proved to be identical on both databases. The mean error rate of the baseline system is 28% at the word level. It has a high recognition speed: 2 words per second (Pentium IV: 1.8 GHz, MatLab) and it needs some 2.5 Mo (including lexicon).

2.2 Classification The recognition system used in the experiments is based on prototype matching. The prototype set was built using MDCA clustering [8] applied to the UNIPEN database Train R01-V07 [4]. Characters are simply preprocessed. The sequence of (x,y)-coordinates is resampled with 20 points per stroke (in order to compensate different writing speeds), centered and normalized preserving the aspect ratio. The nearest neighbor rule is used as the classification criterion. The distance between the input character and each prototype is computed using elastic matching. Then for each class, the smallest distance is retained, giving a distance vector D = (D1,D2, …, DN) where N=62 (classes: ‘A’-‘Z’, ‘a’-‘z’, ‘0’-‘9’).

2.3 Lexical analysis It is an access model with two ways : lexical and topological. It exploits the word superiority effect (it is easier to recognize a letter within a word than alone, out of context [9]). The lexical way combines the topological information (blanks) with sub-lexical information (classification hypotheses) to generate a list of candidate words. The topological way combines letters silhouettes and blanks to generate a second list. The lists of candidates are then compared with the lexicon entries to determine the most relevant word.

3. Writer adaptation strategies The baseline system recognition is writer independent. Its prototype database (the so-called WI database) should cover all the writing styles. Experimental results show that it covers at least the most common writing styles. At least, there are two situations which reduce the recognition rate: ƒ grapheme misses in the writer independent database, it must be stored in the user database. ƒ grapheme is confusing: for a given writer, models of the writer independent database must be inactivated in order to avoid new confusions. Prototype-based systems can be adapted very easily and quickly to new writing styles, just by storing new character samples in the writer dependent (WD) database and inactivating existing prototypes. The system specialization (figure 3) on a given user - by registration of his personal features – makes it writer dependent and increases its accuracy. The comparison of classification hypothesis with either the labeled data (supervised adaptation) or the lexical hypothesis (self-supervised adaptation) detects classification errors.

0-7695-2128-2/04 $20.00 (C) 2004 IEEE

m/r

d

However, the system after adaptation performs better than the baseline system. Complete results are given in table 2.

n

Classification hypotheses

te neconnaissarce

Labels

de reconnaissance

Lexical hypotheses

de meconnaissance

WD WI database

Figure 2. Writer adaptation.

3.1 Supervised adaptation In order to evaluate the influence of adaptation on the system accuracy, we have first compared classification hypothesis with labeled data. Text patterns are classified. Each classification hypothesis is compared with labels. If they do not correspond, the pattern is added to the user database (figure 3). Two strategies have been applied: the “text” strategy where patterns are stored at the end of the text and the “line” strategy where patterns are stored at the end of each line. As we can see (table 1), the “line” strategy is more effective because the missing prototypes are stored sooner. Nevertheless, activating new prototypes is not sufficient to perform « perfect » classification, even with a great amount of labeled data. It is necessary to inactivate - or even delete - some WI prototypes. Table 1. Supervised adaptation :Word Error Rate (WER) & Writer Dependent Database Size (WDDS). Words Baseline system “text” strategy: min mean max “line” strategy: min mean max

WER 150 25% 0% 1.3% 10% 0% 1.1% 6.2%

WDDS 100% +3% +5% +9% +2% +3% +8%

3.2 Self-supervised adaptation In self-supervised adaptation methods, the writer does not need to take care enough with the adaptation process. Sub-lexical hypothesis (classification results) are compared with lexical results to find the prototypes to add. 3.2.1. Systematical activation. In the systematical activation (SA) strategy, we consider that the error rate of the lexical analyzer is 0%. So, each time an error (difference between classification and lexical hypothesis) is detected, the corresponding strokes are stored as new prototypes in the writer personal database (figure 2). When the lexical expert is wrong, prototypes are not stored in the good class and lead to mis-classification.

3.2.2 Conditional activation. The SA strategy is not accurate. So, it seems necessary to study the lexical expert behavior. We can make two observations : ƒ

The accuracy of the lexical expert increases with the number of letters in the word.

ƒ

The accuracy of the lexical expert decreases when the number of classification errors increases.

So, for a given word, we can use a criterion of lexical reliability to estimate the lexical analyzer probability of failure. This criterion is based on two parameters : the number of letters in the word (NbTot) and the number of classification errors (defined as the total number of differences between the classification hypothesis and the lexical one : NbErr). The conditional activation (CA) strategy is given in the following. If relation (1) is verified, the NbErr corresponding letter are stored in the writer database: NbErrf ≤ E(NbTot/K)

K∈N+

(1)

The parameter K is optimized after exhaustive research. The value K=4 shows that the longest the world is, the highest is the reliability of the lexical analyzer. Table 2. Self-supervised adaptation :Word Error Rate (WER) & Writer Dependent Database Size (WDDS). Words Baseline system “SA” strategy: min mean max “CA” strategy: min mean max

50 0% 25% 53% 0% 22% 71%

WER 100 25% 2% 23% 73% 0% 20% 51%

WDDS 150 2% 23% 51% 0% 17% 42%

100% +2% +6% +14% +1% +2% +3%

3.2.3. Inactivation methods. These methods have two goals. As seen previously, using lexical hypothesis as a reference may add confusing or erroneous prototypes. Inactivation methods are used to recover from those prototypes which contribute more often to incorrect than correct classifications. Inactivation methods are also used to prune the prototype set and speed-up the classification [10]. Each prototype (of the WI database as of the WD database) has an initial adequacy (Q0 = 1000). This adequacy is modified during the recognition of the text according to the usefulness of the prototype in the classification process, by comparing the classification hypothesis and the lexical hypothesis. Consider the prototype i of the class j, three parameters are necessary for this adaptation strategy. C : reward (+) the prototype i when it is used for correct classification.

0-7695-2128-2/04 $20.00 (C) 2004 IEEE

I : penalize (-) the prototype i when it is used for incorrect classification.

4. Conclusion & prospects

N : penalize (-) all the useless prototypes of the class j.

We present in this communication several selfsupervised adaptation strategies. The baseline system is dedicated to on-line text recognition. The prototype-based classifier has been initially trained to be writerindependent. Thanks to its structure, it can easily learn new writing style by activating new prototypes and inactivating old ones. The adaptation process increases both recognition rate (from 75% to 83%) and classification speed (nearly twice speeder). In spite of these encouraging results, the method presents a limit: it does not take into account short words (less than for 4 letters) that are yet very currents (45% of the database). Moreover, this adaptation process just modify the classification expert. It should be accurate to re-estimate the parameters of the segmentation expert.

The adequacy is modified at each prototype occurrence (when it is used by the classifier): i

i

Q j(t) = Q j(t-1) + [ C(t) – I(t) – N(t) ] / Fj where Fj is the letter frequency in the French language. The 3 parameters are mutually exclusive i.e at each occurrence, a single parameter is activated . Of course, i when Q j=0, the prototype is erased. Combining activation and inactivation methods by dynamic management of prototypes can improve performances conditionally to a fine tuning of parameters (C, I, N). After exhaustive research, the best results are obtained with the (C=20, I=100, N=5). As we can see (table 3), the use of inactivation methods does not changed the mean error rate (17%) but the WD database size has decreased of 40% after 150 words. Table 2. Inactivation methods : Word Error Rate (WER) & Writer Dependent Database Size (WDDS). Baseline system WD system

WER 28% 17%

WDDS 100% -40%

Now, we focus on the evolution of the prototype adequacy (figure 4). For some writers, the WI prototypes are sufficient. For the ‘a’ class, 2 prototypes are useful. So, the adequacy of the 45 others decreases. For the ‘s’ class, 4 prototypes are useful (the user may have an unstable writing) and the 36 others are inactivated. For other writers (‘s’ and ‘e’ classes), WD prototypes (in bold) are needed. At the beginning, a WI prototype is used and, after some letter occurrences, a WD prototype is stored (the writer get used to the tablet). After some 150 words, the database size decreased of 40%.

5. References [1] Brakensiek A., Kosmala A. & Rigoll G., Comparing adaptation techniques for on-line handwriting recognition, ICDAR’01, 2001. [2] Connel S.D. & Jain A.K.. Writer Adaptation of Online Handwriting Models. ICDAR’99, pp 434-438, 1999. [3] Guyon I., Henderson D., Albrecht P, Le Cun Y. & Denker J., Writer independent and writer adaptive neural network for online character recognition, From pixels to features III, 1992. [4] Guyon I., Schomaker L., Plamondon R., Liberman M. & Janet S., UNIPEN project of on-line data exchange and recognizer benchmarks, ICPR'94, pp. 29-33, 1994. [5] Oudot L., Prevost L. & Milgram M., Dynamic recognition in the omni-writer frame : application to hand-printed text recognition, ICDAR’01, 2001. [6] Paap K., Newsome S.L. Macdonald J.E. & Schvaneveldt, An interactive-verification model for letter and word recognition: the word superiority effect, Psychological review, 89, pp 573594, 1982. [7] Platt J.C. & Matic N.P., A constructive RBF network for writer adaptation, NIPS’9, pp 765-771, 1997. [8] Prevost L. & Milgram M., Modelizing character allographs in omni-scriptor frame: a new non-supervised algorithm, Pattern Recognition Letters, 21(4), pp 295-30, 2000. [9] Reicher G.M., Perceptual recognition as a function of meaningfulness of stimulus material, Journal of experimental psychology, 81, pp 274-280, 1969. [10] Vuori V., Laarksonen J. & Kangas J., Influence of erroneous learning samples on adaptation in on-line character recognition, Pattern Recognition, 35(4), pp 915-926, 2002.

Figure 3. Evolution of the prototype (adequacy vs letter occurrence).

adequacy

0-7695-2128-2/04 $20.00 (C) 2004 IEEE