On the Role of Search for Learning from Examples - CiteSeerX

Report 0 Downloads 67 Views
On the Role of Search for Learning from Examples Stuart A. Kurtz Department of Computer Science University of Chicago 1100 E. 58th St. Chicago, IL 60637-1581 USA [email protected]

Carl H. Smith Department of Computer Science University of Maryland College Park, MD 29742 USA [email protected]

Rolf Wiehagen Department of Computer Science University of Kaiserslautern P.O. Box 3049 D-67653 Kaiserslautern, Germany

[email protected]

Abstract

Gold [Gol67] discovered a fundamental enumeration technique, the so-called identi cation-by-enumeration, a simple but powerful class of algorithms for learning from examples (inductive inference). We introduce a variety of more sophisticated (and more powerful) enumeration techniques and characterize their power. We conclude with the thesis that enumeration techniques are even universal in that each solvable learning problem in inductive inference can be solved by an adequate enumeration technique. This thesis is technically motivated and discussed.

Keywords: Learning from examples, learning by search, identi cation by enumeration, enumeration techniques.

Role of Search

1

1 Introduction The role of search, for learning from examples, is examined in a theoretical setting. Gold's seminal paper [Gol67] on inductive inference introduced a simple but powerful learning technique which became known as identi cationby-enumeration. Identi cation-by-enumeration begins with an in nite e ective list e , e , e ,  of programs for total, computable functions. At stage s, the technique emits ei for the least i such that ei correctly computes the mystery function f for all inputs x  s. In essence, this enumeration technique is simply \innocent until proven guilty": each program is conjectured until it is proven \guilty" by failing to correctly predict a value of f , whereupon the next program not yet proven \guilty" becomes the conjecture. The basic assumption for identi cation-by-enumeration to be applicable, namely the existence of an \e-list" of programs for total computable (or, equivalently, recursive) functions, is, of course, a limitation. However, despite this limitation identi cation-by-enumeration is quite powerful. This power comes from the following simple observation: Lemma 1 [Gol67] Given an e ective list fei : i 2 Ng of programs for recursive functions, identi cation-by-enumeration will succeed in identifying a function f with respect to this list i some ei is a correct program for f . Notice that a number of well-known classes of recursive functions are representable by \e-lists" of programs for recursive functions (or, equivalently, these classes are recursively enumerable), among them the class of all polynomials, the class of all recursive functions which are computable in polynomial time, the class of all recursive functions which are computable in logarithmic space, and the class of all primitive recursive functions. Identi cation-by-enumeration is embodied in many actual situations of learning from examples [MCM83]. See also [Mit82] for a discussion of generalization techniques viewed as search and a taxonomy of existing search techniques. Consider any learning algorithm A such that 1. A learns from examples, 2. A employs search as one of its components. We claim that all such algorithms are, or can be viewed as, and e ectively transformed into, examples of some enumeration technique. An enumeration 0

1

2

Role of Search

2

technique calls for the searching of a linearly ordered set of alternatives. The algorithm A may search through a tree, or a more complicated structure. The search space of A may be generated dynamically, as opposed to being speci ed in advance. In any event, algorithm A could be dissected and transformed into an e ective procedure that generates all and only the items that could ever possibly be examined by A. Thus, an e ective linear order could, in principle, be placed on A's search space. Identi cation-by-enumeration also speci es that programs are the object of the search while condition 2 above makes no mention of what is being searched for. We assume that the learning is not rote and that the result of the learning e ort will be used to make predictions about examples that were not yet seen as input. The underlying phenomenon can then be viewed as a mapping from examples to predictions, i.e., a function. Using suitable, wellknown encoding techniques, it is possible to use the natural numbers (N) to represent (name) all the possible examples and also the set of all possible predictions. This follows from the observation that it is possible to encode every string of ASCII symbols in the natural numbers. These strings include arbitrarily long texts and are certainly sucient to express both examples and predictions. The programs that are sought by an enumeration technique are viewed as computing the function that represents the phenomenon that is the subject of the learning activity. See [CS83] for a further discussion of the relationship between learning general phenomena and the mathematical models of learning. The algorithm A alluded to above may search for something other than a program. Perhaps A is searching for a subprogram, or some vital parameters, or a predicate. From the above discussion, the output of A is a program. Whatever A is searching for, it must be incorporated into its output, as otherwise A could be rewritten as a functionally equivalent algorithm that uses no search, in violation of condition 2 above. Hence, any learning algorithm satisfying conditions 1 and 2 will take the result of its search component and transform it into a program. The Blums [BB75] noticed that there was an enhancement to identi cationby-enumeration. They noted that the programs ei of the above lemma need not compute total functions, all that is required is that the three-place predicate \program e on input x converges to y" is e ectively decidable. This particular enhancement is subsumed by our more powerful enumeration techniques given below.

Role of Search

3

In Section 3, we show that it is possible to construct e ective learning algorithms that search through complicated spaces { so complicated that their implementation requires, in a sense, non-e ective information. We thereby present a way of generating search spaces which are more general than those spaces generated by e ective lists of recursive functions. This clearly widens the range of application of enumeration techniques in learning. However, even the new enumeration technique is not powerful enough, by itself, to account for all the learning that is possible by computer programs. The reason is that even these more general search spaces are not general enough in that they can contain only recursively enumerable sets of recursive functions. Therefore, in Section 4, we study search spaces which may enumerate richer function classes. A consequence of this richness will be that, in these spaces, the halting problem is, in general, undecidable. This, in turn, makes learning in these spaces much harder. Nonetheless, we exhibit a family of such search spaces which, on one hand, is rich enough to contain any class of functions that is learnable in the limit (learning type EX) and, on the other hand, makes learning in these spaces possible by a more powerful and rather unusual enumeration technique. We explore this technique in detail and point out both di erences and similarities when compared to identi cation-by-enumeration. Since there are learning types richer than EX, we then, in Section 5, derive further and more powerful enumeration techniques adequate for some of the more popular of these richer learning types, namely, team learning and behaviorally correct learning. For team learning, simply \parallelizing" the enumeration technique adequate for EX from Section 4 yields a technique adequate for teams. For behaviorally correct learning, we have to combine a further speci c enumeration technique with two techniques that transform a possibly growing set of reasonable candidate hypotheses (resulting from applying the corresponding enumeration technique) into a single hypothesis. In Section 6, we conclude with a thesis informally stating that each solvable problem of learning from examples can be solved by an adequate enumeration technique. We motivate and discuss this thesis. Especially, we show that the thesis is valid not only for approaches where learning is modeled as a limitprocess, but also for nite, or one-shot, learning where the learning device may output only one hypothesis and then stops. We also argue about the eciency of enumeration techniques. Perhaps the main contribution of this paper is the codi cation, validation and possible universality of some uniform perspective on the construction of learning algorithms where search plays a

Role of Search

4

central role.

2 Preliminaries For a general background, we direct the reader to [MY78, Smi94] for the theory of computation, and to [AS83] for inductive inference. To keep the paper self-contained, we have attempted to include de nitions for all the notation we use. Natural numbers will serve as names for programs. Consider the following example. Each LISP program is a string of ASCII symbols, included blanks, carriage returns and EOFs. Consider a lexicographical (dictionary) ordering of all ASCII strings. If the i such ASCII string is a syntactically wellformed LISP program, then that program is the program named i. On the other hand, if the i ASCII string is not a proper LISP program then i is (one of several) names for some, xed, LISP program that never halts, say (defun a ()(a))). Program i computes the (possibly partial) function denoted by 'i. We assume that ' , ' , : : :, forms an acceptable programming system [MY78, Rog58], and that  ,  , : : :, is an associated abstract measure of complexity [Blu67]. Note that i(x) can be thought of as the running time of the '-program i on argument x. Actually, all naturally arising programming systems give rise to an acceptable programming system and all natural ways of counting complexity, e.g. space and time, constitute abstract complexity measures. The example of LISP programs above will satisfy all the axioms of acceptability and the number of steps taken serves as a measure of complexity. We use 'i;k (x) = y to denote 'i(x) = y and i(x) < k. The fact that there exists a k such that i(x) < k, or, equivalently, that 'i(x) is de ned, will be denoted by 'i(x) #. For a (possibly partial) function f and n 2 N such that f (x) is de ned for all x  n, let f n denote the initial segment (f (0), , f (n)) of f . Inductive inference machines (IIMs) are algorithmic devices that accept as input the entire graph, f (0), f (1), f (2), , of a recursive function f , in natural order, and emit programs intended to compute the input function. An IIM M , on input from the graph of f , converges to i i either M outputs a last program which is i, or past some point, M outputs only program i. An IIM M identi es (or explains) f (written: f 2 EX(M )) i on f , M converges th

th

0

1

0

1

Role of Search

5

to some i such that 'i = f . The collection fS : S  EX(M ) for M an IIMg of sets of recursive functions is denoted by EX. There have been many proposed restrictions and enhancements of IIMs, as well as various notions of successful inference [AS83]. An IIM is called Popperian i on all possible inputs, it outputs only programs that compute recursive functions. The class of families of functions identi able by Popperian IIMs is denoted by PEX. PEX is known to be a strict subset of EX and Popperian inference machines are equivalent in power to a class of extrapolation mechanisms [CS83]. Notice that any IIM using identi cationby-enumeration is a fortiori Popperian. Popperian inference machines have been studied extensively [CJNM94]. The remaining preliminaries are recursion theoretic in nature. A recursive operator [Rog67] is a mapping () from (partial) functions to (partial) functions such that there is a recursive function f with ('i) = 'f i for all programs i. A nonempty class S of recursive functions is recursively enumerable i there is a recursive function e such that S = f'e i : i 2 Ng. A limit-recursive function is any total function e such that there is a recursive function h such that for all x 2 N, e(x) = limn!1 h(x; n). A nonempty class S of recursive functions is limit-enumerable i there is a limit-recursive function e such that S = f'e i : i 2 Ng. ( )

( )

( )

3 A More Robust Enumeration Technique In this section we show that all limit-enumerable sets of recursive functions are learnable by some enumeration technique. Hence, enumeration techniques can be applied not only to e ectively generated lists e(0), e(1),  of programs of recursive functions (as it is supposed for the basic identi cationby-enumeration technique), but also to limit-e ectively generated lists. This property clearly widens the range of applicability of enumeration techniques. Furthermore, the witnessing IIM can be made Popperian. A nal de nition will make the statement of our theorem even more general. Actually, we will show that the properties just announced are even \robust" in that they also hold for any \e ective transformation" of the underlying limit-enumerable function class. De nition 2 A collection S of recursive functions is PEX-robust i whenever  is a recursive operator mapping S to a set (S ) of recursive functions,

Role of Search

6

then (S ) 2 PEX.

Now we are ready for our rst result.

Theorem 3 If S is limit-enumerable, then S is PEX-robust. Proof: Suppose S is limit-enumerable. Let e be a limit-recursive function such that S = f'e i : i 2 Ng. Choose h to be a recursive function such that ( )

e(x) = limn!1 h(x; n) for all x. Suppose  is a recursive operator mapping S to the set (S ) of recursive functions. Let f be a recursive function witnessing the e ectiveness of , i.e., ('i) = 'f i for all i. We will construct an IIM M such that (S ) = PEX(M ). Note that this is more than would be necessary, since proving (S )  PEX(M ) would also do. We rst de ne functions y?n(x), where y; n; x 2 N, as follows. Let z > n be minimal such that either 1. h(y; n) 6= h(y; z); or 2. 'f h y;n ;z (x) converges. If condition 1 occurs, then de ne y?n(x) = 0. If condition 2 occurs, then de ne y?n(x) = 'f h y;n (x). If no such z exists, then y?n (x) is unde ned. It is quite clear that the y?n's form a uniformly recursive family of partial recursive functions. In fact, each y?n is even total, as a z satisfying condition 1 or 2 must always exist. There are two cases. If h(y; n) = e(y), then 'f h y;n = 'f e y , which is total by the choice of f , and so condition 2 must eventually apply. If h(y; n) 6= e(y), then condition 1 must eventually occur by the choice of h. We de ne a predicate d(y; ), where y 2 N and  is any tuple of natural numbers which may be thought of as an initial segment of the function to be learned. Let jj denote the length of , and let (k) denote the k component of . Then d(y; ) is true i there exists a z > jj such that 3. h(y; w) = h(y; jj) for all w such that jj  w  z; and 4. 'f h y;jj ;z (k) = (k) for every k < jj. ( )

( (

))

( (

( (

))

))

( ( ))

th

( (

))

Role of Search

7

The predicate d is recursive, by the same proof as for the y?n's. We are now in a position to describe M . On input , let y be the least y < jj satisfying d(y; ). If no such y exists, then M () is de ned to be a '-program for the constant zero function. Let n be the least n such that h(y ; k) = h(y ; jj) for all k such that n  k  jj. Let M () be a '-program for y0?n0 in which y is explicitly encoded. By de nition, M can only produce programs for recursive functions. Hence, M is Popperian. Therefore, it remains only to show that (S ) = EX(M ). First we show that (S )  EX(M ). Fix  2 (S ). Let i be minimal such that  = 'f e i . Let n > i be so large that  if m  n, then h(i0; m) = e(i0) for all i0  i; and  if i0 < i, then there is an x < n such that 'f e i (x) 6= (x). Let n be minimal such that h(i; k) = e(i) for all k  n . Let  be any initial segment of  of length at least n. On input , M must produce a program for i?n0 . To see this, notice that d(i; ) will be true. Furthermore, we have chosen  to be so long that d(i0; ) must be false for all i0 < i, and we have required that jj > i, so M will consider i. The choice of n is similarly forced. Moreover, i?n0 = . To see this, recall that h(i; k) = e(i) for all k  n . Therefore  = 'f e i = 'f h i;n0 . By our de nition, i?n0 (x) = 'f h i;n0 (x) unless condition 1 occurs, but by the choice of n , condition 1 cannot occur. Hence i?n0 = . It remains only to show that EX(M )  (S ). Assume  is in EX(M ). Then M must, on suciently large initial segments  of , produce a xed program, say for y?n. If  6= y?n , this will eventually be recognized (4 above) and M will have to change its mind, contradicting the choice of y?n. If  = y?n , then we must have y?n = 'f e y , or else M would eventually produce a program for y ?k for some y0 6= y (1 above). By the coding condition, the program M produces for y ?k must be di erent from the program it produced for y?n , contradicting the choice of y?n . Thus,  = y?n = 'f e y and so  2 (S ). 0

0

0

0

0

0

( ( ))

( ( 0 ))

0

0

0

0

( ( ))

( (

( (

))

))

0

( ( ))

0

0

( ( ))

2

Role of Search

8

Suppose S is a limit-enumerable class of recursive functions and  is the identity recursive operator, whose e ectiveness is witnessed by the identity function f = x[x]. By the above theorem, (S ) = S is PEX-identi able. The set of all possible outputs generated by M of the above proof on all possible input sequences forms a recursively enumerable set of natural numbers. Since M was shown to be Popperian, this set of all possible outputs of M yields a recursively enumerable set of recursive functions that is also a superset of S , the limit-enumerable class on which the theorem is based. Hence, every limit-enumerable set of recursive functions is contained in some recursively enumerable set of recursive functions. This was also noticed, in the context of inference, in [CS83]. Identi cation-by-enumeration is the basis of many learning algorithms used in arti cial intelligence [MCM83]. Generally, a \solution search space" is de ned and searched. To apply identi cation-by-enumeration in a speci c context, one needs only supply a sub-algorithm which generates the elements of this solution search space: the rest of the learning algorithm is xed. The advantage of our technique is that it makes available a much more powerful language for expressing the generation sub-algorithm: the language of the limit-recursive functions. This idea of using limit-recursive functions rather than recursive functions for generating a subspace of ' appropriate for searching was also used in [Ful90]. Using such search spaces f'e i : i 2 Ng where e is a limit-recursive function, Fulk shows that some classes from EX ? PEX can be learned within these spaces by some enumeration technique. In other words, he showed that there are enumeration techniques which are more powerful than Gold's identi cation-by-enumeration (the latter, as we know from above, is only capable of learning classes from PEX). Notice that for quite a long period such a result was thought to be not possible. Actually, developing his fundamental technique of identi cation-byenumeration Gold conjectured, [Gol67], that his method was the only one in the sense that all learnable function classes are learnable using identi cationby-enumeration; formally, EX = PEX. Some years later, Barzdins proved that EX  PEX, [Bar71], by exhibiting the following \self-describing" class ff : f a recursive function such that 'f = f g. Though formally being in EX?PEX, in order to learn that class a machine has \nothing" to learn; actually, simply returning the rst example f (0) seen as its answer the machine will be done. Again some years later, Barzdins conjectured that, informally, ( )

(0)

Role of Search

9

all classes from EX ? PEX are of the \same kind" as his class, i.e., they all have to be \self-describing" in order to be learnable. As a consequence of Barzdins' conjecture there would be no need for enumeration techniques being more powerful than identi cation-by-enumeration. Indeed, each class from EX is either in PEX, and, consequently, learnable by identi cationby-enumeration or it is in EX ? PEX and hence, by Barzdins' conjecture, it would be trivially learnable by its self-describing nature. In other words, proving Barzdins' conjecture would have proved learning in EX ? PEX, in a sense, boring. However, Fulk disproves Barzdins' conjecture in [Ful90]! It was then clear that there are classes in EX ? PEX that cannot be learned in the trivial way of returning self-descriptions. Instead, the need for new techniques for learning the classes from EX ? PEX did arise. Simultaneously, by disproving Barzdins' conjecture, Fulk exhibited such a technique, even an enumeration technique, for learning some classes from EX ? PEX. In a sense, the existence of such a more powerful enumeration technique yields concrete motivation for this paper. The question of how powerful can enumeration techniques be arises naturally. It is the main intention of the present paper to provide an answer to this question. Therefore, in the next section, we generalize the insight gained by Fulk's result in several directions. We prove a result that yields an enumeration technique which is powerful enough for learning not only some classes from EX ? PEX, but all classes from EX. Hence, one has to create a new idea for this more powerful enumeration technique to follow. On the other hand, we show that the appropriate search spaces necessary and sucient for this enumeration technique to work can be built by using even recursive \e-lists" rather than limit-recursive ones.

4 An Enumeration Technique Universal for EX Our theorem of the previous section shows that what can be learned by the technique exhibited there is contained in PEX, a proper subset of EX. Consequently, there is more to machine learning than taking input data and trying to nd a consistent algorithm generating the data by searching through some space of possibilities. The question is \What else can one do?"

Role of Search

10

Looking at the proof separating PEX and EX gives a potential, but not promising answer. The example set of recursive functions in EX ? PEX is ff : 'f = f g. Call this set S . To infer S , an IIM searches the input instead of some data structure containing potential answers. The general technique suggested by this is to look at the data as more than just examples, but also as, perhaps, an encoding of the solution sought after. For example, an algorithm to build a human being is locked up inside a single strand of DNA. To, in principle, nd this algorithm, one must lter out the irrelevant data and appropriately interpret what is left. Another possible method for extending the scope of enumeration techniques is to include some programs for partial, not total, functions in the data structure to be searched by the learning algorithm. There is a risk involved with such a strategy. The learning algorithm may choose as its hypothesis a program for a partial, not total, function and then try to test the hypothesis by running the program on some of the sample input. The conjectured program may never converge. The learning strategy will never discover that it is waiting for an in nite computation to converge, i.e., the learning algorithm will enter an in nite loop. Note that this risk is quite real. Actually, one can show the following result, Lemma 1 in [WZ94]. Let S be any set of recursive functions beyond the scope of identi cation-by-enumeration, i.e., S is not contained in any recursively enumerable set. Then any ecient subspace of ' containing S has an undecidable halting problem; i.e., more formally, for any recursive function e such that S  f'e i : i 2 Ng, the \halting function"  'e i (x) is de ned h(i; x) = 10;; ifotherwise. is not computable. Nonetheless, it is possible, under certain circumstances, to avoid the problem concerning convergence and be successful at learning a larger collection of function sets than those in PEX, namely all the sets from EX. How to do this is the subject of the next result. This result has been published already in [FKW95]. We include it here toghether with its proof for several reasons. First of all, in order to make the paper self-contained. Second, the proof will be outlined in more detail, especially exhibiting the enumeration technique involved. And third, we then will discuss this enumeration technique the intention of which is quite unusual. Especially, we point out some duality between this technique and Gold's identi cation-by(0)

( )

( )

Role of Search

11

enumeration. In order to state and to prove the result announced we need the notion of f (x) 6= g(x), where f; g are (possibly partial) computable functions and x 2 N. Notice that there are two possibilities for f (x) 6= g(x) to hold, namely rst, both values may be de ned but they are di erent or, second, one of these values is de ned and the other is not.

Theorem 4 A set S of recursive functions is in EX i there are recursive

functions e (for enumerate) and d (for discriminate) such that: 1. S  f'e(i) : i 2 Ng, 2. for all i; j 2 N, if i 6= j then there is an x  d(i; j ) such that 'e(i)(x) 6= 'e(j)(x).

Proof: Suppose S 2 EX and let M be an IIM such that S  EX(M ).

Enroute to constructing the necessary recursive functions e and d, we de ne a set C that describes points at which M changes its conjecture to a reasonable hypothesis while trying to learn various functions. Speci cally, C is the set of all ordered pairs (k; n) of natural numbers such that 'k (x) # for all x  n, and M ('nk ? ) 6= M ('nk ) = k. Let c , c ,  be an e ective repetition free enumeration of C . The method we use to present the recursive function e is to give an e ective de nition of 'e i (x) uniformly in i and x. Suppose that ci = (k; n). Then, 8 ' (x) if x  n > < 'k (x) if x > n and for any y such that n < y  x, 'k (y) # 'e i (x) = k and M ('yk ) = k > : unde ned otherwise. Let g be a recursive function that extracts the second component of the ci's. That is, for any i, if ci = (k; n) then g(i) = n. Finally, we de ne d(i; j ) = maxfg(i); g(j )g. We now show that condition (1) holds. Choose f 2 S . Hence, f 2 EX(M ). Suppose further that M on input from the graph of f converges to k. Consequently, 'k = f . Hence, there is a least n such that for all n0  n, 1

0

1

( )

( )

Role of Search

12

M ('nk ) = k. From the de nition of C , there is an i such that ci = (k; n). By the construction above, 'e i = 'k = f , establishing condition (1). To establish condition (2), suppose that i 6= j . Suppose that ci = (k; n) and cj = (k0; n0). Then M ('nk ) = k and M ('nk ) = k0. Let m = maxfn; n0g = d(i; j ). Now suppose to the contrary that 'mei = 'mej . If n = n0, then k = k0 and, hence, ci = cj would follow leading to a contradiction, since the enumeration of C is repetition free. Consequently, n 6= n0. Suppose without loss of generality that n < n0. But then 'e i (n0) 6= 'e j (n0), since, by the de nition of C and e, 'e j (n0) is de ned, while 'e i (n0) is not (recall that, by de nition of C , M changes its mind from n0 ?1 to n0, i.e., M ('ne j? ) 6= M ('ne j ), causing 'e i (n0) to be unde ned by the de nition of e). Clearly, 'e i (n0) 6= 'e j (n0) contradicts 'mei = 'mej . Hence, 'mei 6= 'mej , establishing condition (2). To complete the proof, suppose that there are recursive functions e and d satisfying conditions (1) and (2) for some set S of recursive functions. We must de ne an IIM M that infers S , i.e., S  EX(M ). This is accomplished by de ning the behavior of M on initial segments of some arbitrary recursive function f . Our M will only output programs in the range of e. More precisely, M will rst output e(0) sometime, then e(1) sometime, then e(2) sometime, and so on, until M arrives at the rst correct program for the function f within the e-list, and this correct output will never later be changed. The crucial point in the construction of M is to provide the machine with a criterion for changing its mind, i.e., for changing its actual guess from e(i) to e(i + 1). Intuitively, this criterion consists of the existence of an \alternative hypothesis" e(j ), j 6= i, such that 'e j and f , the function to be learned, coincide on a suciently large initial segment. Formally, the length of this segment will be just d(i; j ). However, despite that coincidence, e(j ) will not be taken as M 's new hypothesis. Instead, the existence of e(j ) ultimately implies that M 's actual hypothesis, e(i), is incorrect, i.e., 'e i 6= f , and hence its rejection is justi ed. As mentioned earlier, M 's new guess will be e(i +1), i.e., the next one from the e-list. Notice that checking the criterion above, namely enumeratively searching (!) for an alternative hypothesis e(j ), just constitutes the enumeration technique used by M . We proceed by de ning the IIM M formally. 0

( )

0

0

( )

( )

( )

( )

( )

( )

0

( )

( )

0

1

( )

( )

( )

( )

( )

( )

( )

( )

( )

M , on input f , outputs e(0). Suppose M , on input f n? , outputs e(i). Let the input f n be given. If there is a j such that 0

1

Role of Search

13

1. i < j  n, and 2. d(i; j )  n, and 3. 'de ji;j;n = f d i;j , then M outputs e(i + 1), i.e., M (f n ) = e(i + 1), else M outputs e(i), i.e., M (f n ) = e(i). ( ) ( )

(

)

Clearly, M outputs only programs in the range of e. Suppose that f 2 S . We must show that M identi es f . Choose n and suppose that M (f n? ) = e(i). Suppose that M , on input f n, nds a suitable j causing e(i + 1) to be output as the next conjecture. Consequently, i < j and 'de ji;j = f d i;j . By condition (2), there is an x  d(i; j ) such that 'e i (x) 6= 'e j (x). Hence, 'e i 6= f , so the rejection of the hypothesis e(i) is justi ed. If no suitable j is found, then M continues to conjecture e(i). If 'e i = f , then condition (2) guarantees that for any j > i, 'e i (x) = 6 'e j (x) for some x  d(i; j ). Hence a correct hypothesis will never be abondoned by M . Consequently, it remains to show that if M ever outputs e(i) and 'e i 6= f , then e(i + 1) is eventually conjectured by M . Since we are assuming that e(i) is output by M and we know that M never rejects a correct hypothesis and f 2 S , there must be a j > i such that 'e j = f by condition (1). Of course, 'de ji;j = f d i;j , so j will cause the rejection by M of the hypothesis e(i). 1

( ) ( )

( )

(

)

( )

( )

( )

( )

( )

( )

( )

( ) ( )

(

)

2 In the suciency proof of Theorem 4 above, we have exhibited an enumeration technique (let us call this technique the EX-technique in the following) which turns out to be adequate for EX. This technique consists of the following: Given an actual hypothesis e(i), search for an alternative hypothesis e(j ) such that the corresponding function 'e j coincides with the function f to be learned up to d(i; j ). Finding such an e(j ) de nitely proves that the actual hypothesis e(i) is incorrect and hence has to be rejected. There is an interesting duality between the EX-technique and Gold's identi cationby-enumeration, namely: Identi cation-by-enumeration makes disproving an incorrect hypothesis easy but creating a new hypothesis may take extensive search, whereas the situation for the EX-technique is just vice versa. Actually, any hypothesis produced by identi cation-by-enumeration is itself ( )

Role of Search

14

\subject to rejection," i.e., in order to prove its incorrectness one only needs this very hypothesis (together with the graph of the function to be learned). Clearly, since by the general assumption of identi cation-by-enumeration any such hypothesis describes a recursive function, simply computing this function on all arguments will eventually yield a point of incorrectness, if any exists. Conversely, in order to prove the incorrectness of an hypothesis produced by the EX-technique the same approach does not work in general. Not only do these hypotheses not, in general, describe recursive functions, they may even describe proper subfunctions of the function to be learned; hence there is no hope for the identi cation-by-enumeration approach to succeed. Instead, potentially the entire space f'e i : i 2 Ng must be searched in order to nd an appropriate alternative hypothesis yielding the desired incorrectness proof. On the other hand, once an hypothesis has been disproved, the identi cation-by-enumeration technique has to search potentially its whole space of hypotheses to nd the next hypothesis which is consistent with the data seen so far and hence will serve as the new hypothesis. Conversely, creating the new hypothesis by the EX-technique is extremely easy { rejecting e(i) means returning e(i + 1) as the new hypothesis. Of course, the question arises if in the suciency proof of Theorem 4, or, equivalently, in the EXtechnique, it is possible to take just the alternative hypothesis e(j ) found as the new hypothesis. The answer is no. The reason is that then conditions (1) and (2) would not prevent the learning machine from continually changing its mind to an alternative hypothesis and never settling down on the correct '-program of the function to be learned within the e-list. The careful mode of changing from e(i) to e(i + 1), however, guarantees eventual convergence to the correct program within the e-list. Note that, on the other hand, both of the techniques, namely identi cationby-enumeration and the EX-technique, also have a fundamental common characteristic, namely: Both techniques do not rely on the whole space ' but rather on an appropriately chosen subspace given by the corresponding e-list e(0), e(1),  of '-programs. In a sense, the transition from ' to the subspace f'e i : i 2 Ng can be thought of as some kind of pruning. The initial space ' will be pruned to a subspace where some kind of enumeration technique will work successfully. Moreover, the subspace given by conditions (1) and (2) of Theorem 4 has another property. This space is very \economic" in that every function from it is represented by exactly one '-program in it. Clearly, condition (2) implies 'e i 6= 'e j for any i 6= j . ( )

( )

( )

( )

Role of Search

15

One may interpret this property as a realization of Occam's razor. Notice also that the subspaces appropriate for identi cation-by-enumeration can be provided with the same additional property.

5 Yet more powerful enumeration techniques In the previous section we exhibited an enumeration technique which turned out to be speci c and universal for EX, i.e., for the family of all classes of recursive functions which are learnable in the limit. On the other hand, there are learning types which are much richer than EX, [CS83]. The question naturally arises whether enumeration techniques universal for these richer types can be established. In the following, we answer this question armatively for two such types, namely team learning and behaviorally correct learning. Let f be a recursive function and n  1. A team (M ;  ; Mn) of IIMs identi es f (written: f 2 n-TEAM(M ;  ; Mn )) i at least one IIM from the team identi es f , i.e., f 2 EX(Mi) for some i with 1  i  n. The collection fS : S  n-TEAM(M ;  ; Mn ) for some n  1 and IIMs M ;  ; Mn g of sets of recursive recursive functions is denoted by n-TEAM. Finally, let S TEAM = n n-TEAM. Obviously, 1-TEAM = EX. On the other hand, teams of size 2 turn out to be more powerful than single IIMs, i.e., 2-TEAM  EX [BB75]. More precisely, the following result holds [Smi82]: 1-TEAM  2-TEAM    n-TEAM  n+1-TEAM    TEAM. We now will derive a characterization for TEAM which yields an enumeration technique that is universal for TEAM. Intuitively, this technique is some kind of \parallelizing" the enumeration technique for EX exhibited in the proof of Theorem 4. Thus, simply parallelizing the EX-technique yields considerable extra power in learning capability. In order to outline this idea more formally the following easy lemma turns out to be technically useful. 1

1

1

1

1

Lemma 5 Let S be any class of recursive functions. Then S 2 TEAM i there are n  1 and classes S , , Sn 2 EX such that S  S [  [ Sn . 1

1

Proof: Let S 2 TEAM. Then, by de nition, there are n  1 and IIMs M , , Mn such that S  EX(M ) [  [ EX(Mn ). Hence, de ning S = EX(M ), , Sn = EX(Mn) will do. 1

1

1

1

Role of Search

16

Conversely, let S  S [  [ Sn for S , , Sn 2 EX. Then there are IIMs M , , Mn such that S  EX(M ), , Sn  EX(Mn ). Hence, S  n-TEAM(M ;  ; Mn ). Consequently, S 2 TEAM. 1

1

1

1

1

1

2

We are now ready to characterize TEAM. In a sense, this characterization is a \parallelization" of the characterization of EX given by Theorem 4 above. Hence, it is only natural that the enumeration technique adequate for TEAM will just consist of \parallelizing" the enumeration technique for EX. Theorem 6 A set S of recursive functions is in TEAM i there are n  1 and recursive functions e ;  en; d ;  dn such that: S 1. S  kn f'ek i : i 2 Ng, 2. for any k 2 N, 1  k  n, and any i; j 2 N, if i 6= j then there is an x  dk (i; j ) such that 'ek i (x) 6= 'ek j (x). 1

1

( )

1

( )

( )

Proof: Suppose S 2 TEAM. Then, by Lemma 5, there are n 2 N and S classes S , , Sn 2 EX such that (a) S  kn Sk . Applying Theorem 4 to each of the classes Sk 2 EX, 1  k  n, we get recursive functions ek and 1

1

dk such that both (b) Sk  f'ek i : i 2 Ng, and (c) for all i; j 2 N, if i 6= j then there is an x  d(i; j ) such that 'ek i (x) 6= 'ek j (x). Clearly, condition (1) follows from (a) and (b). Furthermore, (c) immediately yields condition (2). Suppose n  1 and suitable recursive functions e ;  en; d ;  dn exist. For any k, 1  k  n, let Sk denote the set of all recursive functions from f'ek i : i 2 Ng. Then, from condition (2) and Theorem 4, it follows that for any k such that 1  k  n, Sk 2 EX. Notice that an IIM EX-identifying Sk can work just by using the enumeration technique from Theorem 4, i.e., searching through the subspace f'ek i : i 2 Ng of ' with the help of the discrimination function dk . Now, by condition (1), we have S  S [[ Sn. Since Sk 2 EX for 1  k  n, we obtain S 2 TEAM by Lemma 5. ( )

( )

( )

1

1

( )

( )

1

Role of Search

17

2 As it follows from the proof of Theorem 6 and the fact that TEAM is much richer than EX, simply parallelizing the EX-technique from Theorem 4 yields the full learning power of TEAM. From another perspective, in order to achieve this full power of TEAM it always suces to choose a team where the members all follow the same \algorithmic idea"! We feel that this is not at all evident a priori. One could expect instead that the extra power of a team over a single learning algorithm comes necessarily from the di erent \algorithmic avors" that are possible with a team. But this is just not the case, as it follows from the proof of Theorem 6 where all the team members use precisely the same algorithm on di erent search spaces. Indeed, it was shown the power of teams comes precisely from the much di erent capabilities (not necessarily inner workings) of team members [AFS96]. This result was extensional in nature as it showed that for any set of recursive functions S such that S 62 EX but S 2 2-TEAM, there was a partition of the class of all IIMs into 2 components such that any team learning S must have members from both partitions. Furthermore, if M and M are IIMs from di erent components, then if EX(M ) \EX(M ) is in nite, then so are EX(M ) ? EX(M ) and EX(M ) ? EX(M ). This means that the IIMs of the team must have very di erent capabilities and that this is the source of the power of the team. Our result above shows that the di erent capabilities arise not from having a di erent algorithm, but by using the same algorithm on a di erent search space, thereby answering a question posed in [AFS96]. It is interesting to note that, perhaps coincidentally, virtually all the team learning algorithms presented in [Smi82] have each member of the team using precisely the same algorithm as every other member of the team. What di ers is the search space used. For a practical example of team learning where all team members use identical algorithms on di erent search spaces we o er the following. To nd the position of land mines, essentially learning a f0; 1g valued function over a two-dimensional grid, several identical mine detectors are employed. Each is used over a di erent region of the area to be searched for mines. Furthermore, the region covered by one mine detector may overlap with the region associated with a di erent detector. We proceed to develop an enumeration technique for an extension of EX along another direction. A new idea is required for this development as the generalization of EX that we are about to present turns out to be set the1

1

1

2

2

2

1

2

Role of Search

18

oretically incomparable with TEAM. Informally, this new idea will consist of combining some enumeration technique with a technique for transforming the potential hypotheses found by the enumeration technique into real hypotheses. We proceed to formally de ne the new learning type we consider below. Suppose f is a recursive function. An IIM M identi es f behaviorally correctly (written: f 2 BC(M )) i on input from the graph of f , M outputs an in nite sequence of programs all of which, except perhaps nitely many, correctly compute f , i.e., there is an n0 2 N such that for any n  n0, 'M f n = f . Notice that we no longer require the sequence of hypotheses to converge to a single program for the function to be learned. Instead, we require that almost all of the programs produced by the IIM are correct, but we allow the case that these programs may be di erent. The collection fS : S  BC(M ) for M an IIM g of sets S of recursive functions is denoted by BC. One motivation for considering BC-style learning is that it might be quite reasonable to replace a correct program later by another correct, but \better" one. This is just what is going on in real programming permanently. Moreover, BC-style learning enhances the capabilities of EX-style learning. Actually, it is well-known that EX  BC, [B74, CS83]. More precisely, BC 6 TEAM and TEAM 6 BC, [Smi82]. Now we exhibit a characterization of BC the proof of which yields an enumeration technique adequate for this learning type. (

)

Theorem 7 A set S of recursive functions is in BC i there are recursive functions e and d such that: 1. S  f'e(i) : i 2 Ng,

2. for any function f 2 S and for all but nitely many i 2 N, if 'e(i)(x) = f (x) for all x  d(i), then 'e(i) = f .

Proof: Suppose S 2 BC and let M be an IIM such that S  BC(M ).

Enroute to constructing the functions e and d, we de ne a set C that describes points at which M produces \reasonable hypotheses." Formally, C is the set of all ordered pairs (j; n) such that 'j (x) # for all x  n and M ('nj ) = j . Let c , c ,  be an e ective and repetition free enumeration of C . Then for any i 2 N such that ci = (j; n), de ne e(i) = j and d(i) = n. 0

1

Role of Search

19

In order to show that conditions (1) and (2) hold let f 2 S and n0 2 N be such that for all n  n0, 'M f n = f . Clearly, (M (f n ); n0) 2 C . Choose i such that ci = (M (f n ); n0). Then, by the de nition of e, 'e i = 'M f n . Since 'M f n = f , we have 'e i = f . Consequently, condition (1) holds. Now, choose j 2 N such that d(j )  n0 and 'e j (x) = f (x) for all x  d(j ). Then 'e j = f follows as above. On the other hand, there are at most nitely many numbers k 2 N such that both d(k) < n0 and 'e k (x) = f (x) for all x  d(k). Hence, condition (2) holds. Suppose that e; d are recursive functions satisfying conditions (1) and (2) for some set S of recursive functions. To complete the proof, we must de ne an IIM M such that S  BC(M ). The IIM M can be thought of as a combination of three techniques: an enumeration technique, a pruning technique and an amalgamation technique. Given f 2 S , M informally works as follows. The enumeration technique searches for and collects all candidates e(i), i 2 N, of possibly correct programs for f . By de nition, e(i) is such a candidate i 'e i (x) = f (x) for all x  d(i). The pruning technique eventually cancels each candidate e(i) collected by the enumeration technique which later turns out to be obviously incorrect. By de nition, candidate e(i) is obviously incorrect i there is an x 2 N such that 'e i (x) # and 'e i (x) 6= f (x). Finally, the amalgamation technique transforms the remaining set of candidates into a single program to be output by M . Intuitively, this program works as follows. On any argument, it simultaneously runs all the remaining candidate programs on that argument and returns the rst value computed, if any. In order to make the amalgamation technique formally precise, let A be a computable function such that for any nite set I of '-programs, A(I ) is a '-program of the following function:  Simultaneously run 'i(x) for every i 2 I . 'A I (x) = Return the value of the rst convergent computation. Using this \amalgamation function" A the IIM M can be de ned formally by the following. Recall that e i (x) can be thought of as the running time of the '-program e(i) on argument x. M (f n ) = A(fe(i) : i 2 N and d(i)  n and for all x  d(i); both e i (x)  n and 'e i (x) = f (x); and for all x 2 N; if d(i) < x  n and e i (x)  n then 'e i (x) = f (x)g): 0

(

0

)

( )

(

0

( )

)

( )

( )

( )

( )

( )

( )

( )

( )

( )

( )

( )

( )

(

0

)

Role of Search

20

We now show that S  BC(M ). Let f 2 S . Then, by condition (2), for suciently large n 2 N, if e(i) is in the input set of A, then e(i) will not be obviously incorrect. Actually, all of the at most nitely many obviously incorrect candidate programs will eventually be cancelled by the pruning technique. Notice that the pruning technique is realized by the last condition in the de nition of M , i.e., \for all x 2 N, if d(i) < x  n and e i (x)  n then 'e i (x) = f (x)." Hence, by the de nition of A, for suciently large n 2 N and all x 2 N, 'M f n (x) = f (x) or 'M f n (x) is unde ned. On the other hand, by condition (1), for suciently large n 2 N, the input set of A will contain at least one correct program e(i), i.e., 'e i = f . Hence, again by the de nition of A, for suciently large n 2 N and all x 2 N, 'M f n (x) is de ned. Consequently, for suciently large n 2 N, 'M f n = f . Thus S  BC(M ). ( )

( )

(

)

(

)

( )

(

(

)

)

2

As we have shown in the last part of the proof of Theorem 7, the enumeration technique adequate for BC searches the subspace f'e i : i 2 Ng of ' and collects all the programs e(i) coinciding with the mystery function f up to d(i). Notice that, in general, the set of all such programs may be in nite! Actually, it is not too hard to show that for S 2 BC?EX this set is in nite for at least in nitely many functions f 2 S . (Otherwise, the input set of the amalgamation function A would become stable for almost all functions f 2 S and suciently large n 2 N, implying S 2 EX, a contradiction.) Of course, this possibly growing set of candidate hypotheses has to be transformed into a single hypothesis. This is done by the pruning procedure which eventually cancels all of the at most nitely many candidates which make an \error of commission," followed by the amalgamation technique which, based on this pruned set of candidates, then creates the actual hypothesis by simply running the remaining candidates simultaneously. Notice that, in general, this hypothesis will not be from the subspace f'e i : i 2 Ng; this is another di erence with the corresponding portion of the proof of Theorem 4. On the other hand, this hypothesis is of course from the space ', since ' was chosen as an acceptable programming system, hence allowing transformations like that done by the amalgamation function A. In this section, we generalized the basic learning type EX in two directions, to TEAM and BC. It is straightforward to combine these notions and de ne BC-TEAM. Having done so, it is possible to use the techniques of ( )

( )

Role of Search

21

the proof of Theorem 6 to prove a result about BC-TEAM that generalizes Theorem 7 in the same way as Theorem 6 generalizes Theorem 4.

6 Conclusions In the previous sections we have seen that enumeration techniques are powerful enough for learning all the concept classes of several established learning types. In a sense, we have exhibited several kinds of enumeration techniques which turned out to be even \characteristic" for the corresponding learning types in that these techniques can be used to learn exactly the concept classes of these learning types. Besides our results presented above, a number of other characterizations of Gold-style types of learning recursive functions can be found in the literature, [BB75, FKW84, FKW95, JB81, Wie78, Wie91, WZ95]. Though some of these types di er essentially from the types investigated above in several respects, nonetheless, a close inspection of the characterizations of all of these types allows the same interpretation as we have derived from our characterization results above. We illustrate this point by giving one more example of a concrete learning type, namely nite, or one-shot, learning. An IIM M nitely identi es a recursive function f (written: f 2 FIN(M )) i there is an n 2 N such that M (f x ) = ? for all x < n, M (f n ) 2 N and 'M f n = f . The collection fS : S  FIN(M ) for M an IIMg of sets of recursive functions is denoted by FIN. Intuitively, in nite learning, on input from the graph of the function to be learned, the learning device has only \one shot", i.e., it may output only one \real" hypothesis, and this one has to be correct. Up to this \one shot" it may say \?", i.e., \I don't know yet", for a while. Thus, in that sense, nite learning di ers essentially from all the learning types studied above where learning was only required to be successful in the limit, i.e., we did not require that we should be able to algorithmically determine when (if ever) the learning machine has successfully nished its learning task. That this is a restriction follows from [FW79] where FIN  EX was proved. Despite this essential di erence between nite learning and the other learning types considered above, nite learning can also be characterized in a way that yields an enumeration technique adequate just for that type of learning. Actually, the following result was proved in [Wie78]. (

)

Role of Search

22

Theorem 8 A set S of recursive functions is in FIN i there are recursive functions e and d such that: 1. S  f'e(i) : i 2 Ng,

2. for all i; j 2 N, if i 6= j then there is an x  d(i) such that 'e(i) (x) 6= 'e(j)(x).

The suciency proof consists of realizing the following enumerative search. On any recursive function f , the corresponding IIM M searches for an i 2 N such that 'e i (x) = f (x) for all x  d(i), and M outputs the rst such e(i) found, if any. If f 2 S , then by condition (1), there is a j such that 'e j = f . If this j would be di erent from the i found by M , then by condition (2), there would be an x  d(i) such that 'e i (x) 6= 'e j (x) = f (x), contradicting M 's choice of i. Consequently, the rst and only \shot" by M was successful. Hence, also for the type FIN of nite learning an adequate enumeration technique exists. All of these results, i.e., those we have presented in our paper and those from the literature cited above, together with their uniform interpretation have led us to state the following thesis: ( )

( )

( )

( )

For each type of Gold-style learning, there is an adequate enumeration technique, i.e., an enumeration technique which can be used to learn exactly the concept classes of that type. More informally, this thesis means that all what is learnable at all can be learned by enumerative search. In that sense, search can be considered as a universal approach to solve learning problems! This was not at all obvious for us in advance. In the \worst" case it might have been necessary to develop a speci c method for each class to be learned. In a \bad" case still in nitely many \speci c methods" would have to be found in order to solve all the problems of learning from examples. A large but nite number of methods or at least two of them might have been necessary. All of these cases might have to be considered as \normal" ones. But, \unnormally" and surprisingly, one method seems to be sucient to solve each solvable

Role of Search

23

learning problem! Moreover, proving our thesis for the concrete learning types considered in this paper (as well as looking at the corresponding proofs for other learning types in the literature cited above) makes it clear that most of these proofs are rather subtle, sometimes sophisticated and, perhaps, nontrivial. Hence, again it seems to us not evident a priori to expect that such a proof does exist for each learning type. Furthermore, we do not expect the existence of a \master proof" yielding all of these results uniformly. On the other hand, we also cannot exclude the existence of such a master proof. Actually, formalizing the notions of \Gold-style learning type" and \enumeration technique" would transform our thesis into a statement which could be formally proved or disproved. At least, this does not seem to be totally hopeless. At this time, we prefer motivating our thesis analogously to the way Church's thesis was motivated: by formally proving it for reasonable special cases, as we did above for several concrete learning types. There is another fact supporting our thesis. Also in Gold-style learning of formal languages rather than recursive functions a considerable number of characterizations have been found, [Ang80, BCJ96, dJK96, JS94, LZ94, Wie77, ZL95, ZLK95], all of which technically motivate our thesis. There is another argument to discuss concerning our thesis. Even if one recognizes the validity of the thesis, i.e., if one accepts enumerative search as a universal principle for solving learning problems, one might argue that enumerative search is far from being ecient. Well, universality does of course not mean that this principle is the only one to use. Clearly, there are classes of recursive functions which can be learned \directly" much more eciently than enumeratively. For example, the somewhat arti cial but technically useful class of the the so-called self-describing functions, i.e., ff : f is a recursive function such that 'f = f g already mentioned above. On the other hand, even a so conceptually simple and basic technique like Gold's identi cation-by-enumeration for learning recursively enumerable classes of recursive functions is optimal (!) with respect to such a crucial complexity measure like sample complexity. More formally, suppose M is an IIM identifying a recursive function f , i.e., f 2 EX(M ). Let sample(M; f ) denote the least number m 2 N such that 'M f n = f for all n  m. Thus, sample(M; f ) is just the minimal number of examples of the function f in natural order which M needs for learning f . Then an IIM M is said to be sample optimal for a set S of recursive functions i S  EX(M ) and there is no IIM M 0 such that S  EX(M 0), sample(M 0; f )  sample(M; f ) for all (0)

(

)

Role of Search

24

f 2 S , and sample(M 0; f ) < sample(M; f ) for some f 2 S . Intuitively, M is sample optimal if its sample complexity cannot be improved uniformly by any other IIM, i.e., decreasing this complexity on one function from the class to be learned results in increasing this complexity on one or more other functions from that class. Then the following remarkable fact does hold, [Gol67]: Let S be any recursively enumerable class of recursive functions. Let M be an IIM EX-learning S by identi cation-by-enumeration. Then M is sample optimal for S . Notice that in [JB81] necessary and sucient conditions for an IIM being sample optimal were derived. In order to give another even more popular, but completely serious argument for the possible eciency of enumerative search, let us consider computer chess. Though over the years a lot of re nements have been found and incorporated to prune the necessary search, the basic principle of the most powerful computer chess programs still remains brute force search. Nonetheless, 99.99: : :% of all human chess players have no chance to beat the best chess programs. The number of 9's will increase. Even a world champion knows the feeling of having lost a match to the computer. Thus, even the possible eciency of enumerative search may not be so bad as it could seem at rst glance. In any case, we have shown in this paper that our thesis does hold for several of the more common learning types. The characterizations we derive suggest the following possibly universal approach to the construction of learning algorithms consisting of two essential ingredients, which in a sense must mutually cooperate. One ingredient is the creation of an appropriate search space (the e-lists from our theorems). The other ingredient is to nd an e ective test involving input data and queries to the search space (using the functions d from our theorems) to search for reasonable hypotheses. Sometimes these hypotheses are immediately output as actual hypotheses by the learning algorithm (as it is the case for the basic identi cation-byenumeration and also for the FIN-technique). Sometimes these hypotheses have to be processed in order to produce a new actual hypothesis (as, in very di erent ways, it is the case for both the EX-technique and the BCtechnique). In our results the existence of such a test is guaranteed whenever the learning problem has a solution. Perhaps the main contribution of this paper is the codi cation, validation, and possible universality of that perspective on the construction of learning algorithms.

Role of Search

25

7 Acknowledgments During the nal writing of this paper we learned of the sudden death of Mark Fulk. We feel it our duty to acknowledge the in uence of his work on our work. Much of this work was done while the second author was on leave at the University of Kaiserslautern. The support of the Fulbright Commission is gratefully acknowledged. The authors would like to thank Dianna Gordon and Jim Owings for reading and commenting on an earlier draft of this paper.

References [AFS96] K. Apstis, R. Freivalds, and C. Smith. On duality in learning and the selection of learning teams. Information and Computation, 129(1):53{62, 1996. [Ang80] D. Angluin. Inductive inference of formal languages from positive data. Information and Control, 45:117{135, 1980. [AS83] D. Angluin and C. H. Smith. Inductive inference: Theory and methods. Computing Surveys, 15:237{269, 1983. [Bar71] J. Barzdins. Complexity and frequency solution of some algorithmically unsolvable problems. PhD thesis, Novosibirsk State University, 1971. Habilitation. In Russian. [Bar74] J. Barzdins. Two theorems on the limiting synthesis of functions. In J. Barzdins, editor, Theory of Algorithms and Programs, volume 1, pages 82{88. Latvian State University, Riga, 1974. In Russian. [BB75] L. Blum and M. Blum. Toward a mathematical theory of inductive inference. Information and Control, 28:125{155, 1975.

Role of Search

26

[BCJ96] G. Baliga, J. Case, and S. Jain. Synthesizing enumeration techniques for language learning. In Proceedings of the 9th Annual Conference on Computational Learning Theory, pages 169{180. ACM Press, 1996. [Blu67] M. Blum. A machine-independent theory of the complexity of recursive functions. Journal of the ACM, 14:322{336, 1967. [CJNM94] J. Case, S. Jain, and S. Ngo Manguelle. Re nements of inductive inference by Popperian and reliable machines. Kybernetika, 30:23{ 52, 1994. [CS83] J. Case and C. Smith. Comparison of identi cation criteria for machine inductive inference. Theoretical Computer Science, 25(2):193{ 220, 1983. [dJK96] D. de Jongh and M. Kanazawa. Angluin's theorem for indexed families of r.e. sets and applications. In Proceedings of the 9th Annual Conference on Computational Learning Theory, pages 193{204. ACM Press, 1996. [FKW84] R. Freivalds, E. Kinber, and R. Wiehagen. Connections between identifying functionals, standardizing operations, and computable numberings. Zeitschrift fur mathematische Logik und Grundlagen der Mathematik, 30:145{164, 1984. [FKW95] R. Freivalds, E. Kinber, and R. Wiehagen. How inductive inference strategies discover their errors. Information and Computation, 118:208{226, 1995. [Ful90] M. A. Fulk. Robust separations in inductive inference. In Proceedings of the 31st Annual Symposium on Foundations of Computer Science, volume I, pages 405{410, St. Louis, Missouri, 22{24 October 1990. IEEE. [FW79] R. Freivalds and R. Wiehagen. Inductive inference with additional information. Elektronische Informationsverarbeitung und Kybernetik, 15(4):179{184, 1979.

Role of Search

27

[Gol67] E. M. Gold. Language identi cation in the limit. Information and Control, 10:447{474, 1967. [JB81] K. P. Jantke and H.-R. Beick. Combining postulates of naturalness in inductive inference. Journal of Information Processing and Cybernetics (EIK), 17:465{484, 1981. [JS94] S. Jain and A. Sharma. Characterizing language identi cation by standardizing operations. Journal of Computer and System Sciences, 49:96{107, 1994. [LZ94] S. Lange and T. Zeugmann. Characterization of language learning on informant under various monotonicity constraints. Journal of Experimental and Theoretical Arti cial Intelligence, 6:73{94, 1994. [MCM83] R. Michalski, J. Carbonell, and T. Mitchell. Machine Learning. Tioga Publishing Co., Palo Alto, CA, 1983. [Mit82] T. Mitchell. Generalization as search. Arti cial Intelligence, 18(2):203{226, 1982. [MY78] M. Machtey and P. Young. An Introduction to the General Theory of Algorithms. North-Holland, New York, 1978. [Rog58] H. Rogers Jr. Godel numberings of partial recursive functions. Journal of Symbolic Logic, 23:331{341, 1958. [Rog67] H. Rogers Jr. Theory of Recursive Functions and E ective Computability. McGraw Hill, New York, 1967. [Smi82] C. Smith. The power of pluralism for automatic program synthesis. Journal of the ACM, 29(4):1144{1165, 1982. [Smi94] C. Smith. A Recursive Introduction to the Theory of Computation. Springer, 1994. [Wie77] R. Wiehagen. Identi cation of formal languages. In Proceedings of the 6th Symposium on Mathematical Foundations of Computer Science, volume 53 of Lecture Notes in Computer Science, pages 571{579. Springer, 1977.

Role of Search

28

[Wie78] R. Wiehagen. Characterization problems in the theory of inductive inference. In Proceedings of the International Colloquium on Automata, Languages and Programming, volume 62 of Lecture Notes in Computer Science, pages 494{508. Springer, 1978. [Wie91] R. Wiehagen. A thesis in inductive inference. In Proceedings International Workshop on Nonmonotonic and Inductive Logic, volume 543 of Lecture Notes in Arti cial Intelligence, pages 184{207. Springer, 1991. [WZ94] R. Wiehagen and T. Zeugmann. Ignoring data may be the only way to learn eciently. Journal of Experimental and Theoretical Arti cial Intelligence, 6:131{144, 1994. [WZ95] R. Wiehagen and T. Zeugmann. Learning and consistency. In Algorithmic Learning for Knowledge-Based Systems, volume 961 of Lecture Notes in Arti cial Intelligence, pages 1{24. Springer, 1995. [ZL95] T. Zeugmann and S. Lange. A guided tour across the boundaries of learning recursive languages. In Algorithmic Learning for Knowledge-Based Systems, volume 961 of Lecture Notes in Arti cial Intelligence, pages 190{258. Springer, 1995. [ZLK95] T. Zeugmann, S. Lange, and S. Kapur. Characterizations of monotonic and dual monotonic language learning. Information and Computation, 120:155{173, 1995.