Guest editors' introduction: special section on ... - IEEE Xplore

Report 3 Downloads 114 Views
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

VOL. 13,

NO. 2,

MARCH/APRIL 2001

145

Guest Editors' Introduction: Special Section on Connectionist Models for Learning in Structured Domains Paolo Frasconi, Member, IEEE, Marco Gori, Fellow, IEEE, and Alessandro Sperduti

æ 1

INTRODUCTION

C

ONNECTIONIST models have significantly contributed to the development of artificial intelligence, particularly in the area of machine learning. However, before a problem can be solved in a connectionist framework, it is necessary to find a representation of data that is compatible with existing architectures (or vice versa, to devise a novel architecture that is suitable for the data representation at hand). The large majority of methodological studies, theoretical results, and applications are limited to vectorbased representations and (to a lesser extent) to sequential representations. However, recursive or nested representations, as opposed to ªflatº attribute-value representations are needed in different situations. Some interesting application domains include chemistry and biochemistry (where complex molecules can be represented as labeled graphs), natural language processing, pattern recognition (where objects are represented by hierarchical data structures), and learning about the World Wide Web. The interest in developing architectures capable of dealing with these rich representations began more than a decade ago, partially stimulated by Fodor and Pylyshyn's criticisms about lack of compositionality and systematicity of connectionist systems [1]. Different approaches have been proposed, some of which collected in a special issue of the journal Artificial Intelligence [2]. In particular, the RAAM model proposed by Pollack [3] is based on backpropagation to discover compact recursive distributed representations of trees with a fixed branching factor. Recursive distributed representations are an instance of the concept of a reduced descriptor introduced by Hinton [4] to solve the problem of mapping part-whole hierarchies into connectionist networks. A formal characterization of representations of structures in connectionist systems using the tensor product was developed by Smolensky [5]. Later on, Plate introduced holographic reduced representations [6], removing some of the tensor product's limitations.

. P. Frasconi is with the Department of Systems and Computer Science, University of Florence, Via di Santa Marta 3-50139 Firenze, Italy. E-mail: [email protected]. . M. Gori is with the Department of Information Engineering, University of Siena, Via Roma 56-53100 Siena, Italy. E-mail: [email protected]. . A. Sperduti is with the Department of Computer Science, University of Pisa, Corso Italia 40, I-56125 Pisa, Italy. E-mail: [email protected]. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number 112675.

Several learning tasks can be formulated in the case of structured or relational data. The simplest setting is concept learning, where objects in the instance space are naturally represented as labeled graphs. In this case, the model should compute a binary function whose domain is a set of labeled graphs, where labels can be either symbols or realvalued vectors. More generally, we can conceive a supervised learning problem in which both the input and the output portions of data are structured. Ordinary classification and regression problems (with vector-based representations) can be seen as very special cases of structured classification and regression in which each graph is reduced to a single labeled node with no edges. The supervised learning problem (classification and regression) on directed acyclic graphs has been approached using recursive neural networks (a.k.a. folding networks) [7], [8], [9], [10], which share with RAAMs the idea of adapive encoding of data structures. These architectures can be seen as a generalization of recurrent neural networks for sequences and training relies on the maximum likelihood principle, where gradients of the likelihood are computed by the backpropagation through structure algorithm. In another interesting setting, the learning domain is comprised of multiple entities and a set of relations is defined among these entities. Databases and knowledge bases naturally give rise to this type of domains. Relational learning (as opposed to attribute-value learning), however, has received attention mainly outside the connectionist community. Inductive logic programming [11] is perhaps the best known approach. Other forms of relational learning have been studied in frameworks more closely related to connectionism. For example, Friedman et al. [12] have proposed an algorithm for learning Bayesian networks from relational databases.

2

SCANNING

THE ISSUE

Carrasco and Forcada focus on recursive neural networks. The paper is primarily concerned with representational issues. In the case of tree-structured inputs, the computation of a recursive neural network has analogies to the computation of frontier-to-root tree automata (like recurrent neural networks' computation resembles the computation of string automata). This intuition is formalized in the paper and several different strategies for encoding tree automata (or transducers) into recursive neural

1041-4347/01/$10.00 ß 2001 IEEE

146

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

networks are presented. Either strategies based on firstorder and high-order connection are considered. The main result shows that networks with finite weights can exactly simulate tree automata, but at the cost of high saturation in the state units, which may lead to the problem of vanishing gradients. Chan proposes an architecture suitable for applications to natural language processing. The system can learn context-dependent representations that emerge from linguistic primitives (in particular, lexical subsymbolic representations are generated by using an RAAM and then hierarchically clustered). Context-dependent representations are then generated from sentences by using a sort of belief propagation over a syntactic network. Foggia et al. focus on pattern recognition problems. The basic instance type in this work is called generalized attributed relational graph (GARG), which essentially consists of a graph with generalized nodes, edges, and labels. A novel symbolic algorithm for learning GARGs is introduced and compared to connectionist learning in structured domains with recursive neural networks. It is argued that neither of the two approaches is universally better, each one being more suited to capture different kinds of regularities in the data. The paper provides evidence of this claim by means of an empirical evaluation on an optical character recognition task. It is shown that symbolic and connectionist learning are somehow orthogonal (in the sense that each one performs better in different situations) and the integration of the two systems is beneficial. The paper by Hammer is a theoretical analysis of learnability and generalization for folding (or recursive) networks. For arbitrary inputs, the VC-dimension of recurrent neural networks for sequences and recursive neural networks for trees turns out to be infinite and, thus, distribution-independent bounds to the generalization error cannot exist. Extending previous results on distributiondependent learnability, the paper reports bounds on the VC-dimension and VC-pseudodimension that depend on the number of weights in the network and the maximum height of input trees. Hodge and Austin are interested in learning how to organize vector-based data in a structured way. They propose a method for hierarchical clustering that generalizes the Growing Cell Structures (GCS) algorithm. The new unsupervised learning algorithm, called TreeGCS, overcomes a stability weakness of GCS. It grows a selforganizing structure by mapping high-dimensional input vectors onto a two-dimensional hierarchy that reflects the topological order of the input space. The authors show that TreeGCS can emulate the similarity structure produced by a dendogram. Lane and Henderson propose Simple Synchrony Networks, an architecture that can learn structural relationships. In this paper, the architecture is specifically for parsing natural language. The model learns to generate structural relationships between syntactic constituents and is employed to build a parse tree for a given input sentence. The method is based on an extension of recurrent neural networks combined with an adaptive version of temporal synchrony variable binding. Although experimental results

VOL. 13,

NO. 2,

MARCH/APRIL 2001

do not significantly improve the state of the art of corpusbased parsing, the paper shows that neural networks can be effectively used as an alternative to stochastic context-free grammars. Paccanaro and Hinton introduce Linear Relational Embedding, a novel architecture for learning binary relations between concepts. Their method encodes concepts by vectors and relations by matrices so that applying a relation to a concept can be performed by a matrix-vector multiplication. Given a set of examples (which are sets of pairs of concepts belonging to different relations), the goal of learning is to find a suitable set of vectors and matrices for representing concepts and relations. The model is given a probabilistic interpretation and training is performed by optimization of the Kullback-Leibler divergence. The simulations show that this method can solve the family tree problem that was early introduced by Hinton [13] and that was shown to be a hard problem for standard backpropagation networks. Petridis and Kaburlasos describe an algorithm for graph clustering and its application to text categorization. The core algorithm relies on the -Fuzzy Lattice Neurocomputing scheme for computing a graph inclusion measure. The paper shows an application of the general methodology to clustering subgraphs stemming from a ªmasterº graph that encodes a thesaurus of English (nodes associated to words and edges encoding the synonym relation). This ultimately is employed for computing clusters of semantically related words (hyperwords), a technique that is shown to be effective for reducing the number of features in the classification of large text documents. Another contribution about representational issues is the one by Rachkowskij. In this case, structures are represented by means of sparse binary codevectors. The paper offers a comparative description of different representational schemes, including associative-projective neural networks, holographic reduced representations, and binary spatter codes. Context-dependent thinning is employed to bind codevectors, allowing the construction of distributed autoassociative memories with a superposition learning rule. The methodology is applied to examples of modeling analogical retrieval from memory.

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8]

J.A. Fodor and Z.W. Pylyshyn, ªConnectionism and Cognitive Architecture: A Critical Analysis,º Cognition, vol. 28, pp. 3-71, 1988. G.E. Hinton, ªSpecial Issue: Connectionist Symbol Processing,º Artificial Intelligence, vol. 46, nos. 1-2, 1990. J. B. Pollack, ªRecursive Distributed Representations,º Artificial Intelligence, vol. 46, nos. 1-2, pp. 77-106, 1990. G.E. Hinton, ªMapping Part-Whole Hierarchies into Connectionist Networks,º Artificial Intelligence, vol. 46, pp. 47-75, 1990. P. Smolensky, ªTensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems,º Artificial Intelligence, vol. 46, pp. 159-216, 1990. T.A. Plate, ªHolographic Reduced Representations,º IEEE Trans. Neural Networks, vol. 6, no. 3, pp. 623-641, 1995. C. Goller and A. KuÈchler, ªLearning Task-Dependent Distributed Structure-Representations by Backpropagation through Structure,º Proc. IEEE Int'l Conf. Neural Networks, pp. 347-352, 1996. A. Sperduti, D. Majidi, and A. Starita, ªExtended CascadeCorrelation for Syntactic and Structural Pattern Recognition,º Advances in Structural and Syntactical Pattern Recognition, P. Perner, P. Wang, and A. Rosenfeld, eds., pp. 90-99, 1996.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 13, NO. 2, MARCH/APRIL 2001

[9] [10] [11] [12] [13]

A. Sperduti and A. Starita, ªSupervised Neural Networks for the Classification of Structures,º IEEE Trans. Neural Networks, vol. 8, no. 3, 1997. P. Frasconi, M. Gori, and A. Sperduti, ªA General Framework for Adaptive Processing of Data Structures,º IEEE Trans. Neural Networks, vol. 9, no. 5, pp. 768-786, 1998. S. Muggleton and L. De Raedt, ªInductive Logic Programming: Theory and Methods,º J. Logic Programming, vol. 19, no. 20, pp. 629-679, 1994. N. Friedman, L. Getoor, D. Koller, and A. Pfeffer, ªLearning Probabilistic Relational Models,º Proc. Int'l Joint Conf. Artificial Intelligence, 1999. G.E. Hinton, ªLearning Distributed Representations of Concepts,º Proc. Eighth Ann. Conf. Cognitive Science Soc., pp. 1-12, 1986.

Paolo Frasconi received the MSc degree in electronic engineering in 1990 and the PhD degree in computer science in 1994, both from the University of Florence, Italy. Since 2000, he has been an associate professor of computer science with the Department of Systems and Computer Science (DSI) at the University of Florence. In 1999, he was an associate professor at the University of Cagliari, Italy. In 1998, he was a visiting lecturer with the School of Information Technology and Computer Science at the University of Wollongong, Australia. From 1995 to 1998, he was an assistant professor at the University of Florence. In 1992, he was a visiting scholar in the Department of Brain and Cognitive Science at the Massachusetts Institute of Technology. His current research interests include learning in neural networks, Markovian models, and belief networks, with particular emphasis on problems involving learning about sequential and structured information. Application fields of his interest include bioinformatics, natural language processing, and image document processing. Dr. Frasconi serves as an associate editor for the IEEE Transactions on Neural Networks and the IEEE Transactions on Knowledge and Data Engineering. He is a member of the IEEE, the ACM, the IAPR, and the AI*IA.

147

Marco Gori received the Laurea degree in electronic engineering from the UniversitaÁ di Firenze, Italy, in 1984 and the PhD degree in 1990 from the UniversitaÁ di Bologna, Italy. He was also a visiting student at the School of Computer Science, McGill University, Montreal, Canada. In 1992, he became an associate professor of computer science at the UniversitaÁ di Firenze and, in November 1995, he joined the University of Siena, where he is currently a full professor. His main research interests are in pattern recognition, neural networks, and artificial intelligence. Dr. Gori organized the NIPS '96 postconference workshop on ªArtificial Neural Networks and Continuous Optimization: Local Minima and Computational Complexity,º and coorganized the Caianiello Summer School on ªAdapting Processing of Sequencesº held in Salerno in September 1997. Dr. Gori serves as a program committee member of several workshops and conferences, mainly in the area of neural networks. He is an associate editor of the IEEE Transactions on Neural Networks, Pattern Recognition, and the International Journal of Pattern Recognition and Artificial Intelligence. He is the Italian chairman of the IEEE Neural Network Council (R.I.G.), is a fellow of the IEEE, and is a member of the IAPR, SIREN, and AI*IA Societies. Alessandro Sperduti received the Laurea and doctoral degrees in 1988 and 1993, respectively, all in computer science from the University of Pisa, Italy. In 1993, he spent a period at the International Computer Science Institute, Berkley, California, supported by a postdoctoral fellowship. In 1994, he returned to the Computer Science Department at the University of Pisa, where he is currently an associate professor. His research interests include data sensory fusion, image processing, neural networks, and hybrid systems. In the field of hybrid systems, his work has focused on the integration of symbolic and connectionist systems. Dr. Sperduti has contributed to the organization of several workshops on this subject and also served on the program committee of conferences on neural networks.