Pattern Recognition Letters 23 (2002) 1427–1437 www.elsevier.com/locate/patrec
Learning structural shape descriptions from examples L.P. Cordella *, P. Foggia, C. Sansone, M. Vento Dipartimento di Informatica e Sistemistica, Universit a di Napoli Federico II, Via Claudio, 21 I-80125 Napoli, Italy
Abstract A method for learning shapes structurally described by means of attributed relational graphs (ARG’s) is discussed and tested. The method is based on an algorithm that, starting from a set of labeled shapes, finds out the set of maximally general prototypes. These prototypes, given in terms of a suitably defined data structure which generalizes the ARG’s, satisfy the properties of completeness and consistency with reference to the training set, and result to be particularly effective for their interpretability. After resuming the algorithm, the paper addresses the problem of shape representation by ARG’s, and then presents the experimental results of a learning task, with reference to a database of artificial images generated by a set of attributed plex grammars. The main focus here is not on the learning algorithm, but on its applicability to the problem of learning shapes from examples. A discussion of the results, aimed to highlight pros and cons, is finally reported. 2002 Elsevier Science B.V. All rights reserved. Keywords: Attributed relational graphs; Structural descriptions; Shape prototyping; Inductive learning
1. Introduction Structural methods (Pavlidis, 1977; Frasconi et al., 1998) imply complex procedures both for recognition and learning. The learning problem, i.e. the task of building a set of prototypes adequately describing the members of each different class, is complicated by the fact that the prototypes, implicitly or explicitly, should include a model of the possible differences among members of the same class. In fact, in real applications, the
*
Corresponding author. E-mail addresses:
[email protected] (L.P. Cordella),
[email protected] (P. Foggia),
[email protected] (C. Sansone),
[email protected] (M. Vento).
information is affected by distortions, and consequently the descriptions of single members of a class may result quite different from each other. The difficulty of defining effective learning algorithms is so high that the problem is considered still open, and only few methods, usable under rather peculiar hypotheses, are available by now. A first approach to the problem assumes that structured information can be encoded in terms of a vector, thus making possible the adoption of statistical/neural paradigms. In this way, it is possible to use the large variety of well-established and effective algorithms available both for learning and for classifying patterns. The main disadvantages deriving from the use of these techniques are the possible loss of complex structural information, due to the adopted data structure, and the
0167-8655/02/$ - see front matter 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 2 ) 0 0 1 0 3 - 4
1428
L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437
impossibility of accessing the knowledge acquired by the system. In fact, after learning, the knowledge is implicitly encoded (e.g. in the weights associated to the connections of a neural net) and its use, outside the classification stage, is strongly limited. Examples of this approach are illustrated in (Frasconi et al., 1998; Hinton, 1990; Touretzky, 1990). Attributed relational graphs (ARG’s), i.e. graphs enriched with a set of attributes associated to nodes and edges, are the most effective data structure for representing structural information. In case of shapes, the attributes of nodes and edges respectively represent properties of the primitive shape components and of the relations among them. An approach, pioneered by (Winston, 1970), groups methods facing the learning problem directly in the representation space of the structured data (Lavrac and Dzerosky, 1994; Pearce et al., 1994; Dietterich and Michalski, 1983; Alquezar and Sanfeliu, 1997). So, if data are represented by graphs, the learning procedure generates graphs for representing the prototypes of the classes. According to this approach, in (Messmer and Bunke, 1996) a learning algorithm for symbols represented by graphs is defined. The method is based on a clustering procedure using a graph-edit distance. Some methods ascribable to the same approach are based on the assumption that the prototypical descriptions are built by interacting with an expert of the domain (Rocha and Pavlidis, 1994; Nishida, 1996). The human inadequacy to formalize the criteria that bring to find a set of prototypes really representative of a given class significantly increases the risk of errors, especially in domains containing either many data or many different classes. More automatic methods are those facing the problem as a symbolic machine learning problem (Michalski, 1980), formulated as follows: ‘‘given a suitably chosen set of input data, whose classes are known, and possibly some background domain knowledge, find out a set of optimal prototypical descriptions for each class’’. A formal enunciation of the problem and a more detailed discussion to related issues will be given in the next Section. Dietterich and Michalski (1983) provide an ex-
tensive review of this field, populated by methods which mainly differ in the adopted formalism (Michalski, 1980; Quinlan, 1993), sometimes more general than that implied by the graphs. This approach is really effective, since the obtained prototype descriptions, besides being explicit, hold the property of being maximally general. This makes them very compact, i.e. containing only the minimum information for covering all the samples of a same class and for preserving the distinction between objects of different classes. Due to these properties, the user can easily acquire knowledge about the domain by looking at the prototypes generated by the system, which appear simple, understandable and effective. Consequently he can validate or improve the prototypes or understand what has gone wrong in case of classification errors. In this paper we discuss a method for learning shapes structurally described by ARG’s. The method is based on an algorithm which, starting from a set of labeled shapes, finds out the set of maximally general prototypes. The focus of the present paper is not only on the learning algorithm (for details see also Foggia et al., 2001), but mainly on its effectiveness for learning shapes from examples. The learning algorithm faces the problem directly in the space of the graphs and obtains general and consistent prototypes with a low computational cost with respect to other symbolic learning systems, based on first order descriptions. The obtained shape prototypes, given in terms of a suitably defined data structure which generalizes the ARG’s, satisfy the properties of completeness and consistency with reference to the training set, and result to be particularly effective for their interpretability. After a discussion on the way structural descriptions and prototypes are represented in terms of graphs (Section 2), the learning algorithm is resumed in Section 3. Section 4 reports the experimental results of a shape learning task with reference to a database of artificial images generated by a set of attributed plex grammars, together with a discussion of the obtained results aimed to highlight pros and cons.
L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437
2. Representational issues: structures and prototypes Our approach to prototyping is inspired to basic machine learning methodologies and particularizes the inference operations to the case of descriptions given in terms of ARG’s. We will first introduce a new kind of graph, devoted to represent the prototypes of a set of ARG’s. Such kind of graph will be called generalized attributed relational graph (GARG) as it contains generalized nodes, edges, and attributes. Then, we will formulate a learning algorithm that builds the prototypes by means of a set of operations directly defined on the graphs. The algorithm preserves the generality of the prototypes generated by classical machine learning algorithms. As with most machine learning systems (Dietterich and Michalski, 1983; Michalski, 1980), the prototypes obtained by our system are consistent, i.e. each sample is covered by only one prototype. We assume that the shapes of the objects of interest are described in terms of ARG’s. An ARG can be defined as a six-tuple ðN ; E; AN ; AE ; aN ; aE Þ,
1429
where N and E N N are respectively the sets of nodes and edges, AN and AE are the sets of node and edge attributes, while aN and aE are the functions which associate to each node or edge the corresponding attributes. We will suppose that the attributes of both nodes and edges are expressed in the form tðp1 ; . . . ; pkt Þ, where t is a type chosen over a finite alphabet T of possible types and ðp1 ; . . . ; pkt Þ is a tuple of parameters, also from finite sets P1t ; . . . ; Pktt . Both the number of parameters (kt , the arity associated to type t) and the sets they belongs to depend on the type of the attribute; for some type, kt may be equal to zero, meaning that the corresponding attribute has no parameters. It is worth noting that the introduction of the type permits to differentiate the description of different kinds of nodes (or edges); in this way, each parameter associated to a node (or an edge) assumes a meaning depending on the type of the node itself. For example, we could use the nodes to represent different parts of an object, by associating a node type to each kind of part (see Fig. 1).
Fig. 1. (a) Objects made of three types of component parts: circles, triangles and rectangles. (b) The description scheme introducing three types of nodes, each associated to a part. Each type contains a set of parameters suitable to describe each part. Similarly edges of the graph, describing topological relations between parts, may be of two different types (on_top and left). (c) The ARG’s corresponding to the objects given in (a). (d) A GARG representing the two ARG’s shown in (c). The GARG represents ‘‘any object made of a part on the top of a rectangle of any width and height’’.
1430
L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437
In order to allow a prototype to match a set of ARG’s we extend the definition of attribute. First of all, the set of attribute types of nodes and edges is augmented with the special type /, without parameters and allowed to match any attribute type. For the other attribute types, if the sample has a parameter whose value is within the set Pit , the corresponding parameter of the prototype belongs to the set Pit ¼ }ðPit Þ, where }ðPit Þ is the power set of Pit , i.e. the set of all the subsets of Pit . Referring to the shapes shown in Fig. 1, a node of the prototype could have the attribute rectangle (s or m, m), meaning a rectangle whose width is small or medium and whose height is medium. We say that a GARG G ¼ ðN ; E ; AN ; AE ; aN ; aE Þ covers a sample G and use the notation G j¼ G (the symbol j¼ denotes the covering relation) iff there is a mapping l : N ! N such that: l is a monomorphism;
ð1Þ
the attributes of nodes and edges of G are compatible with the corresponding ones of G: ð2Þ The compatibility relation, denoted with the symbol , is defined as in the following: 8t; / tðp1 ; . . . ; pkt Þ and 8t; tðp1 ; . . . ; pkt Þ tðp1 ; . . . ; pkt Þ () p1 2 p1 ^ . . . ^ pkt 2 pkt
ð3Þ
Condition (1) requires that each primitive and each relation in the prototype must be present also in the sample. This allows the prototype to specify only the features that are strictly required for discriminating among the various classes, neglecting the irrelevant ones. Condition (2) constrains the monomorphism required by condition (1) to be consistent with the attributes of the prototype and of the sample, by means of the compatibility relation defined in (3). This relation simply states that the type of the prototype attribute must be either equal to / or to the type of the corresponding attribute of the sample. In the latter case all the parameters of the attribute, which are actually sets of values, must contain the value of the corresponding parameter of the sample.
The covering relation between a prototype and a sample graph could be tested using any graph matching algorithm, suitably modified to take into account the attributes of the two graphs. In particular, we have used a modified version of the algorithm described in Cordella et al. (2001). Another important relation that will be introduced is specialization (denoted by the symbol /): a prototype G1 is said to be a specialization of G2 iff 8G; G1 j ¼ G ) G2 j ¼ G:
ð4Þ G1
In other words, a prototype is a specialization of G2 if every sample covered by G1 is also covered by G2 . Hence, a more specialized prototype imposes stricter requirements on the samples to be covered. In Fig. 1d a prototype covering the shapes of Fig. 1a is shown.
3. The learning algorithm The goal of the learning algorithm can be stated as follows: there is a (possibly infinite) set S of all the shapes that may occur, partitioned into C different classes S1 ; . . . ; SC , with Si \ Sj ¼ ; for i 6¼ j. The algorithm is given a finite subset S S (training set) of labeled patterns (S ¼ S1 [ . . . [ SC with Si ¼ S \ Si ), from which it tries to find a sequence of prototype graphs G1 ; G2 ; . . . ; Gp , each labeled with a class identifier, such that the following properties hold: 8G 2 S 9i : Gi G ðcompletenessÞ
ð5Þ
8G 2 S Gi G ) classðGÞ ¼ classðGi Þ ðconsistencyÞ
ð6Þ
Class (G) and class (G ) refer to the class represented by the prototypes G and G . Of course, this is an ideal goal because of the finiteness of S . In practice, the algorithm can only verify that completeness and consistency hold for the samples in S. On the other hand, Eq. (5) dictates that, in order to get as close as possible to the ideal case, the prototypes generated should be able to model also samples not found in S, that is they must be more general than the enumeration of the samples in the training set. However, they should not be too general, otherwise Eq. (6) would not be
L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437
1431
Fig. 2. A sketch of the learning procedure.
satisfied. The achievement of the optimal tradeoff between completeness and consistency makes prototyping a really hard problem. A sketch of the learning algorithm is presented in Fig. 2; the algorithm starts with an empty list L of prototypes, and tries to cover the training set by successively adding consistent prototypes. When a new prototype is found, the samples covered by it are eliminated and the process continues on the remaining samples of the training set. An effect of this elimination is that each generated prototype contains only the features needed to discriminate among samples not yet filtered out by the previous prototypes. Hence, at classification time a sample has to be matched against the prototypes in the same order in which they have been generated, stopping as soon as a matching prototype is found. This strategy entails the advantage that the algorithm is able to deal automatically with classes whose samples are subgraphs of the samples belonging to other classes. In this case, the algorithm will choose the order in which the prototypes of the subgraphs are considered only when the prototypes of the larger graphs have already been ruled out. The algorithm fails if no consistent prototype covering the remaining samples can be found. It is worth noting that the test of consistency in the
algorithm actually checks if the prototype is almost consistent, i.e.: ConsistentðG Þ () max i
jSi ðG Þj Ph jSðG Þj
ð7Þ
where SðG Þ denotes the sets of all the samples of the training set covered by a prototype G , Si ðG Þ the samples of the class i covered by G and h is a suitably chosen threshold, close to 1. Relation (7) implies that almost all the samples covered by G belong to the same class. Also note that the association of a prototype to a class is performed after building the prototype. According to (7) the algorithm would consider consistent a prototype if more than a fraction h of the covered training samples belong to a same class, avoiding a further specialization of this prototype that could be detrimental for its generality. The most important part of the algorithm is the FindPrototype procedure, illustrated in Fig. 3. The construction of a prototype starts from a trivial prototype with one node whose attribute is / (i.e. a prototype that covers any non-empty graph). The prototype is then refined by successive specializations until either it becomes consistent or it covers no samples at all. An important step of the FindPrototype procedure is the construction of a set
Fig. 3. The function FindPrototype.
1432
L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437
Q of specializations of the tentative prototype G . The adopted definition of the heuristic function H, guiding the search of the current optimal prototype, will be examined later. At each step, the algorithm tries to refine the current prototype definition, in order to make it more consistent, by replacing the tentative prototype with one of its specializations. To accomplish this task we have defined a set of specialization operators which, given a prototype graph G , produce a new prototype G such that G /G . The considered specialization operators are: 1. Node addition: G is augmented with a new node n whose attribute is /. 2. Edge addition: a new edge ðn1 ; n2 Þ is added to the edges of G , where n1 and n2 are nodes of G and G does not contain already an edge between them. The edge attribute is /. 3. Attribute specialization: the attribute of a node or an edge is specialized according to the following rule: • If the attribute is /, then a type t is chosen and the attribute is replaced with tðP1t ; . . . ; Pktt Þ. This means that only the type is fixed, while the type parameters can match any value of the corresponding type. • Else, the attribute takes the form tðp1 ; . . . ; pkt Þ, where each pi is a (non-necessarily proper) subset of Pit . One of the pi such that pi > 1 is replaced by pi fpi g, with pi 2 pi . In other words, one of the possible values of a parameter is excluded from the prototype. The heuristic function H is introduced for evaluating how promising the provisional prototype is. It is based on the estimation of the consistency and completeness of the prototype (see Eqs. (5) and (6)). To evaluate the consistency degree of a provisional prototype G , an entropy based measure is used: X j Si j jSi j log2 Hcons ðS; G Þ ¼ S j j jS j i ! X jSi ðG Þj jSi ðG Þj log2 jSðG Þj jSðG Þj i ð8Þ
H is defined so that the larger is the value of Hcons ðS; G Þ, the more consistent is G : hence the use of Hcons will drive the algorithm towards consistent prototypes. The completeness of a provisional prototype is taken into account by a second term of the heuristic function, which simply counts the number of samples covered by the prototype: Hcompl ðS; G Þ ¼ jSðG Þj
ð9Þ
This term is introduced in order to privilege general prototypes with respect to prototypes which, albeit consistent, cover only a small number of samples. The heuristic function used in our algorithm is the one described in (Quinlan, 1993): H ðS; G Þ ¼ Hcompl ðS; G ÞHcons ðS; G Þ
ð10Þ
4. Experiments and discussion The method has been tested on a database of artificial images generated by a set of attributed plex grammars (Felder, 1971). The technique used for generating the database has been also employed in (de Mauro et al., 2001) in the context of a similarity learning problem by means of a recursive neural network. The images are obtained by combining a set of predefined blocks (the terminal symbols of the grammar) which can be connected together at given attachment points. The productions of the grammar define how the blocks are joined together. The result of the composition is a new object (a non-terminal symbol), for which the corresponding production also defines the attachment points; this object, in turn, can be connected to other terminal or non-terminal objects by other productions. Further, attributed plex grammars associate a quantitative information represented by a vector of attribute values to each terminal symbol (i.e. length, color, texture parameters, shape parameters, orientation, etc.). A more detailed description of the image synthesis methodology can be found in (Bunke et al., 2001), while the image generation tool together with the documentation is available for download
L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437
from the URL (root portal): http://www.artificialneural.net. In the experiments, we used grammars to generate three different classes of images: houses, ships and policemen. By randomly choosing the productions in the grammar, we generated a database composed of 1800 different images (600 images per class). The whole database was split into a training set, made up of 600 images (200 per class) and a test set made up of the remaining 1200 images (400 per class). Fig. 4 shows some images of the database. Fig. 5 illustrates the adopted description scheme in terms of ARG’s. For each image, a simple segmentation algorithm is used to find out regions
1433
with the same color. Each region is represented in the resulting graph by a node. Node attributes encode the diameter of the region, normalized with respect to the image size, and its ellipticity, measured as the ratio between the minor and the major axis of inertia (the value is 1 for circular regions and tends to 0 for straight lines). Both the attributes are quantized in five levels. The edges of the graph are used to represent the adjacency of two regions. Two edge attributes encode the relative x and y positions of the centers of the two regions. Also these attributes are quantized in five levels. The resulting graphs have a number of nodes that ranges from 3 to 13 (8 nodes on the average), and the maximum out-degree is six.
Fig. 4. Some samples of the training set (a) and of the test set (b).
1434
L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437
Fig. 5. The alphabet of the types defined for the considered images.
Fig. 6. (a) The image of a policeman and (b) its representation as an ARG.
Fig. 6 shows the representation obtained for a sample of the class policeman. Tests have been made with different compositions of the training set; in particular five subsets have been randomly extracted by considering a number of samples per class ranging from 10 to 200. In every test the generated prototypes were consistent, giving rise to a 100% recognition rate on the training set. With reference to the above described tests, the learning time for the biggest training set (200 samples per class) was about 15 s, on a 500 MHz Pentium III PC equipped with 128 MB of RAM. Even though the training set is not very large (600 samples), this time should be considered quite short if compared to the times required by full first-order learning algorithms.
Table 1 shows the number of the generated prototypes and the recognition rate on the test set, for the different training sets. It can be noted that, even with a training set made of 20 samples, the performance of the system is quite satisfactory; by Table 1 Recognition rate on the test set and number of generated prototypes as a function of the training set size No. of training samples per class
Recognition rate on test set
No. of generated prototypes
10 20 50 100 200
85.83 93.50 99.50 99.50 99.75
6 4 5 5 6
L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437
1435
Fig. 7. The GARG’s obtained as prototypes of the class of houses (H1 and H2), ships (S1 and S2) and policemen (P1 and P2). For each prototype, the number of covered samples in the training set is also shown. The sequence number indicates the order in which the prototypes have been generated, which is also the order used during the classification.
increasing the number of samples to 50 per class, the recognition rate already reaches about 99%. Further enlargements of the training set (i.e. passing from 100 to 200 samples per class) produce negligible improvements in the recognition rate. Moreover, in all the tests, the number of the obtained prototypes changes only slightly.
Fig. 7 shows the prototypes generated using 200 samples per class. With respect to the case of 50 or 100 training set samples there are two differences: (a) an additional prototype (H2) is generated and (b) the simplest prototype, previously assigned to the house class, is now attributed (as S2) to the ship class. As the ordering of the prototypes affects
1436
L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437
the classification results (see Section 3), these changes make the system able to correctly classify three more ship samples of the test set. The figure also presents an informal description of the obtained prototypes. It is worth noting that the prototypes are very easily interpretable in terms of regions in the corresponding images, and of adjacency relations between these regions. It is a relatively simple and direct task to understand which parts of the image are considered distinctive of each class. This is one of the main advantages deriving from the adoption of a first-order symbolic learning method. For comparison, consider the result that would have been obtained with a neural or statistical learning system: a possibly complex classification function depending on a possibly high number of numerical parameters, none of which could have been easily related to the contents of the image. We will discuss later how these advantages can significantly help to understand the causes of recognition errors. Let us first illustrate the classification results on the test set.
Table 2 Classification matrix on the test set, obtained with the training set made up of 200 samples per class House Ship Policeman
House
Ship
Policeman
100 0 0
0 99.25 0
0 0.75 100
Table 2 reports the classification table on the test set obtained with a training set made up of 200 samples per class. As it is evident, three samples of ships are confused with a policeman. Fig. 8 shows the images and the ARG’s of these misclassified samples. It can be noted that these ARG’s do not match with the prototype S1 of the ship class as they refer to ships without portholes and having only two masts; this structural property leads to graphs having no node with three outgoing edges. Moreover, since all the three considered ARG’s have at least five nodes, they match with the prototype P2 of the policeman class. In other words, these errors are due to representativeness problems of the training set,
Fig. 8. Errors made on the test set: (a) the three ships confused with a policeman, (b) their ARG’s and (c) the prototype they are attributed to.
L.P. Cordella et al. / Pattern Recognition Letters 23 (2002) 1427–1437
which does not contain any sample of ships without portholes and with only two masts with sails. In fact, the only ship sample of the training set without portholes and with two mast has also no sails, giving rise to an ARG made up of only three nodes. This ARG does not match with the P2 prototype and then the consistency of P2 is preserved. Therefore, the system has no need of generating an additional prototype while uses the simplest prototype (S2––see last row of Fig. 7) for representing such a ship sample. We have been able to reach this conclusion, finding the weak points in the representativeness of the chosen training set, with a relatively little investigation effort. This is possible thanks to the high understandability of the prototypes and to the ease of understanding how the misclassified samples are matched against the prototypes. This latter is a direct consequence of the adoption of a structural recognition algorithm (based on graph matching, in this case), which in turn has been made effective by the availability of a structural learning method. 5. Conclusions In this paper we have discussed the applicability of a recently proposed structural learning method to the problem of prototyping shape descriptions. These are given in terms of ARG. The experimental analysis highlighted the advantages of a symbolic approach to the problem of shape learning: the understandability of the obtained prototypes allows the user to interpret the result of the learning phase and to easily find a way to overcome errors due to representativeness faults of the training set. The use of a training set made of artificially generated images allowed us to point out how general and easy to understand the prototypes are. References Pavlidis, T., 1977. Structural Pattern Recognition. Springer, New York. Frasconi, P., Gori, M., Sperduti, A., 1998. A general framework for adaptive processing of data structures. IEEE Transactions on Neural Network. 9 (5), 768–785.
1437
Hinton, G.E., 1990. Mapping part-whole hierarchies into connectionist networks. Artificial Intelligence 46, 47–75. Touretzky, D.S., 1990. Dynamic symbol structures in a connectionist network. Artificial Intelligence 42 (3), 5–46. Winston, P.H., 1970. Learning Structural Descriptions from Examples, Tech. Report MAC-TR-76. Department of Electrical Engineering and Computer Science, MIT. Lavrac, N., Dzerosky, S., 1994. Inductive Logic Programming: Techniques and Applications. Ellis Horwood. Pearce, A., Caelly, T., Bischof, W.F., 1994. Rulegraphs for graph matching in pattern recognition. Pattern Recognition 27 (9), 1231–1247. Dietterich, T.G., Michalski, R.S., 1983. A comparative review of selected methods for learning from examples. In: Michalski, R.S. et al. (Eds.), Machine Learning: An Artificial Intelligence Approach, Vol. 1. Morgan Kaufmann, pp. 41– 82. Alquezar, R., Sanfeliu, A., 1997. Recognition and learning of a class of context-sensitive languages described by augmented regular expressions. Pattern Recognition 30 (1), 163–182. Messmer, B., Bunke, H., 1996. Automatic learning and recognition of graphical symbols in engineering drawing. In: Kasturi, R., Tombre, K. (Eds.), Graphics Recognition, Lecture Notes in Computer Science, Vol. 1072. Springer, Berlin, pp. 123–134. Rocha, J., Pavlidis, T., 1994. A shape analysis model with applications to a character recognition system. IEEE Transactions on PAMI 16 (4), 393–404. Nishida, H., 1996. Shape recognition by integrating structural descriptions and geometrical/statistical transforms. Computer Vision and Image Understanding 64, 248–262. Michalski, R.S., 1980. Pattern recognition as rule-guided inductive inference. IEEE Transactions on Pattern Analysis and Machine Intelligence 2 (4), 349–361. Quinlan, J.R., 1993. Learning logical definitions from relations. Machine Learning 5 (3), 239–266. Foggia, P., Genna, R., Vento, M., 2001. Symbolic vs. connectionist learning: an experimental comparison in a structured domain. IEEE Transactions on Knowledge and Data Engineering 13 (2), 176–195. Cordella, L.P., Foggia, P., Sansone, C., Vento, M., 2001. An improved algorithm for matching large graphs. In: Proceedings of the 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition, Ischia, Italy, pp. 149–159. Felder, T., 1971. Plex languages. Information Sciences 3, 225– 241. de Mauro, C., Diligenti, M., Gori, M., Maggini, M., 2001. Similarity learning for graph-based image representations. In: Proceedings of the 3rd IAPR-TC15 Workshop on GraphBased Representations in Pattern Recognition, Ischia, Italy, pp. 250–259. Bunke, H., Irniger, C., Gori, M., Hagenbuchner, M., Tsoi, A.C., 2001. Generation of image databases using attributed plex grammars. In: Proceedings of the 3rd IAPR-TC15 Workshop on Graph-Based Representations in Pattern Recognition, Ischia, Italy, pp. 200–209.