Abstract 1 Introduction

Report 10 Downloads 313 Views
Convolution Kernels on Discrete Structures UCSC-CRL-99-10

David Haussler Department of Computer Science University of California at Santa Cruz Santa Cruz, CA 95064 email: [email protected] URL: http://www.cse.ucsc.edu/ haussler July 8, 1999

Abstract We introduce a new method of constructing kernels on sets whose elements are discrete structures like strings, trees and graphs. The method can be applied iteratively to build a kernel on a innite set from kernels involving generators of the set. The family of kernels generated generalizes the family of radial basis kernels. It can also be used to dene kernels in the form of joint Gibbs probability distributions. Kernels can be built from hidden Markov random elds, generalized regular expressions, pair-HMMs, or ANOVA decompositions. Uses of the method lead to open problems involving the theory of innitely divisible positive denite functions. Fundamentals of this theory and the theory of reproducing kernel Hilbert spaces are reviewed and applied in establishing the validity of the method.

1 Introduction Many problems in statistics and pattern recognition demand that discrete structures likes strings, trees, and graphs be classied or clustered based on 1

similarity. To do this, it is desirable to have a method to extract real-valued features 1 (x) 2(x) : : : from any structure x in a class X of discrete structures. If nitely many features are extracted, the feature extraction process can be represented by a mapping from X into d-dimensional Euclidean space