Learning Categorial Grammar by Uni cation with ... - Semantic Scholar

Report 4 Downloads 146 Views
Learning Categorial Grammar by Uni cation with Negative Constraints JACEK MARCINIEC

Adam Mickiewicz University Poznan, Poland email: [email protected] Abstract This paper presents the modi cation of the algorithm for determining categorial grammars from linguistic data, presented in [5], in two directions: enabling negative input and admitting more types than just the sentence type. The notion of restricted optimal uni cation is introduced | optimal uni cation sensitive to negative constraints. Key words: categorial grammar, uni cation, discovery procedure.

Introduction The logical foundations of categorial grammar as well as its relations with various branches of knowledge have been thoroughly elaborated in van Benthem [2]. Uni cation techniques in categorial grammar have also been discussed in e.g. Klein and van Benthem [7]. Buszkowski and Penn in [5] introduced a uni cation discovery procedure (learning algorithm) for nding an optimal categorial grammar on the basis of a language sample. The algorithm presented there is an extension of the one elaborated by Buszkowski [3] and [4] for rigid categorial grammars. Both algorithms employ uni cation in the process of discovery, the earlier one a standard uni cation (cf. [8]), whereas 

from: Journal of Applied Non-Classical Logics, Volume 4 { n 2/1994: 181{200

1

2 the later one a new technique of optimal uni cation. Roughly speaking, an optimal uni er is a substitution which uni es as much as possible of a given set. Therefore a procedure relying on optimal uni cation always leads to an (optimal) solution. Optimal, however, does not necessarily mean adequate as far as linguistic acceptance is concerned. It follows from Kanazawa [6] that regardless of linguistic criteria we choose, a suitable categorial grammar can be detarmined from positive data, if a sample is suciently large. If, however, one's knowledge concerning what belongs to a language is limited to few examples only, it may be reasonable to consider negative constraints apart from positive information. In this paper, following the nal suggestions formulated in [5], a further extension of the procedure under consideration is proposed. This task can be formulated as follows: Starting with two mappings, treated as fragments of two type-assignments, de ned only for some set of functor-argument structures, a possibly simple categorial grammar generating the same type-assignment if restricted to the domain of the rst mapping and disjoint with the second one is being sought. These two mappings constitute positive and negative postulates, respectively, concerning the grammar searched for. The author acknowledges the supervision of this research by Wojciech Buszkowski.

Preliminaries We recall some basic notions used in [5]. The set of all functor-argument structures on the set of atoms V , FS(V ), is de ned as the smallest set satisfying the following conditions: V  FS(V ); if A1; : : :; An 2 FS(V ) then (A1; : : :; An)i 2 FS(V ); for all n  2; 1  i  n: If A = (A1; : : : ; An)i 2 FS(V ) then Ai is the functor whereas each Aj , for j 6= i, is an argument in structure A. For any S  FS(V ), SUB(S ) denotes the smallest set such that: S  SUB(S ); if (A1; : : :; An)i 2 SUB(S ) then Aj 2 SUB(S );

3 for all j = 1; : : : ; n: The set of primitive types Pr = Prc [ Var, Prc \ Var = ;, where Prc is a set of constant primitive types and Var is a set of variables. The set Tp = FS(Pr) (Tpc = FS(Prc)) will be referred to as the set of (constant) types. In the examples below in this paper we will use more traditional notation writing a1 : : :ai?1nai=ai+1 : : : an instead of (a1; : : : ; an)i. By a classical categorial grammar we mean a triple G = (VG ; IG; sG ) where a nite set of atoms VG is called the lexicon of G, a mapping IG : VG ?! 2Tp , such that IG (v) is nite for each v 2 VG , is called the initial type-assignment of G, sG 2 Prc is called the principal type of G. The terminal type-assignment TG : FS(VG ) ?! 2Tp is de ned as follows: TG(v) = IG(v); for v 2 VG ; TG((A1; : : : ; An)i) = fai 2 Tp : (9aj 2 TG(Aj ); j 6= i)(a1; : : :; an)i 2 TG(Ai)g: Throughout this paper we will use the expression grammar in the meaning of classical categorial grammar. A grammar G is said to be rigid i card(IG(v)) = 1, for all v 2 VG . For any grammar G and t 2 Tp, we de ne a category of type t: CATG(t) = fA 2 FS(VG ) : t 2 TG(A)g: There is no need to designate any primitive type from the point of view of our reasoning. However, in order to keep with the tradition we distinguish S | the type of sentence. For the same reason we have left the notion of the principal type in the de nition of a grammar. Each mapping  : Var ?! Tp will be called a substitution. If  is a substitution then we expand it to a mapping from Tp to Tp: (1) (p) = p, if p 2 Prc ; (2) ((a1; : : :; an)i) = ((a1); : : :; (an))i: Since we only deal with nite substitutions, we will present them in the form dx1 : a1; : : :; xn : ane. Let  denote the set of all substitutions (when Pr is xed). By `id' we denote the identity substitution. Let T = fT1; : : :; Tng, where each Ti  Tp, for i = 1; : : :; n, is nite. We say that a substitution  is a uni er of T if for all i; a; b such that 1  i  n, a; b 2 Ti we have (a) = (b). A uni er  of T is called a most general

4 uni er (mgu) of T if for any uni er  of T there is a substitution such that  = . Two substitutions and are said to be variants, if there are substitutions  and  such that =  and =  . One can e ectively check if a given family T is uni able and, if so, nd an mgu of T (cf. [8, 5]). For a substitution  we de ne an equivalence relation ker()  Tp  Tp | the kernel of :

ker()(a; b) , (a) = (b); for a; b 2 Tp. For any T  Tp we de ne:

T= ker() = fT \ [a] : a 2 T g; where [a] denotes the equivalence class of type a with respect to the relation ker(). For T = fT1; : : :; Tng we set

T = ker() = T1= ker() [ : : : [ Tn= ker(): One can easily prove the following two facts: (3) T = ker() is uni able by ; (4)

if  is an mgu of T = ker() then T = ker() = T = ker():

A key notion in [5] is the following one. An optimal uni er of T is a substitution  ful lling the following conditions: for all 1  i  n and all a; b 2 Ti, if (a) 6= (b), then the set f(a); (b)g is not uni able  is an mgu of T =ker(): An illustration of the approach is given below through simple examples. Consider the following formulas: (5)

(John; (likes; Mary)1)2 ?! S; ((a; girl)1; (likes; John)1)1 ?! S :

5 The arrows are interpreted here as the assignment of the type on the right to the structure on the left. First we mark each occurrence of each atom with a di erent superscript: (John1; (likes1; Mary1)1)2 ?! S; (6) (( 1 2 2 1 a ; girl )1; (likes ; John )1)1 ?! S : Then we list all argument structures appearing in (6): John1; Mary1; girl1; John2; (likes2; John2)1; and choose di erent variables x1; : : : ; x5 corresponding to the above structures. Now we add new `arrows' to the set (6): John11 ?! x1; Mary ?! x2; 1 girl 2 ?! x3; John 2 ?! x4; 2 (likes ; John )1 ?! x5: Finally, by applying the rule: (7) if (A1; : : : ; An)i ! ai and Aj ! aj , for j 6= i then Ai ! (a1 ; : : :; an)i ; we can derive types for functor structures (likes1; Mary1)1 ?! x1n S; likes1 1 ?! (x1n S)=x2; 1 (a ; girl )1 ?! S =x5; a1 2 ?! (S =x5)=x3; likes ?! x5=x4: Now we connect (unmarked) atoms with the types assigned to the copies of these atoms: John ?! x1; x4; Mary ?! x2; (8) likes ?! (x1n S)=x2; x5=x4; girl ?! x3; a ?! (S =x5)=x3:

6 What we have obtained may be treated as an initial type assignment of some grammar. This grammar, however, referred to as a general form determined by (5), is not what we desire, since it involves too many primitive types. In order to simplify this grammar we use uni cation. We may treat (8) as a family T = fT1; : : :; T5g, where T1; T2; : : : denote sets of types assigned to John; Mary; : : :, respectively. One can easily check that the following substitution is an mgu of T :

dx4 : x1, x2 : x1, x5 : x1n Se: Applying the above substitution to (8) we obtain the nal output of our algorithm:

John Mary likes girl a

?! ?! ?! ?! ?!

x1 ; x1 ; (x1n S)=x1; x3 ; (S =(x1n S))=x3;

which is fully acceptable from the linguistic point of view, if we interpret types x1 and x3 as those of proper a noun and a common noun, respectively. In the above example we formulated only positive postulates. Though, we obtained quite a satisfactory solution: unique, rigid and linguistically sound. This, however, is not always the case. In general we may observe the lack of one or more of the three properties. Let us consider another example: (9)

(John; (likes; Mary)1)2 ((only; John)1; (likes; Mary)1)2 (John; ((only; likes)1; Mary)1)2 (John; (likes; (only; Mary)1)1)2

?! ?! ?! ?!

S; S; S; S:

Applying the same algorithm as before we construct a general form determined by (9):

John (10) Mary likes only

?! ?! ?! ?!

x 1 ; x4 ; x 6 ; x 9 ; x2; x5; x8; x11; (x1n S)=x2; (x3n S)=x5; x7; (x9n S)=x10; x3=x4; ((x6n S)=x8)=x7; x10=x11:

7 The above family is not uni able and generates several di erent optimal grammars, three of which we list below:

John (11) Mary likes only

?! ?! ?! ?!

John Mary (12) likes only

?! x1; ?! x1; ?! (x1n S)=x1; (((x1n S)=x1)n S)=x1; x1; (x1n S)=((x1n S)=x1); ?! ((x1n S)=x1)=x1;

John (13) Mary likes only

?! ?! ?! ?!

x1 ; x1 ; (x1n S)=x1; x1=x1; ((x1n S)=x1)=((x1n S)=x1);

(x6n S)=x8; (((x6n S)=x8)n S)=((x6n S)=x8); x6; (x6n S)=x8; (((x6n S)=x8)n S)=((x6n S)=x8); x8; (((x6n S)=x8)n S)=((x6n S)=x8); ((x6n S)=x8)=(((x6n S)=x8)=((x6n S)=x8)):

Only (11) here has a natural linguistic interpretation (again, we interprete x1 as the type of proper noun). Observe that both (12) and (13) assign the type S to such linguistically nonsense structures as (John; ((only; John)1; Mary)1)2 and (John; ((only; Mary)1; Mary)1)2. In order to eliminate `bad' outputs we enrich the initial sample (9) with two negative postulates:

; ((only; John)1; Mary)1)2 ? 6 ! S; (14) ((John John; ((only; Mary)1; Mary)1)2 ?6 ! S : Slashed arrows following structures indicate types `forbidden' for these structures. It can easily be seen that (12) and (13) do not ful l (14). As it will be shown later, the role of negative postulates does not merely consist in verifying the outputs generated by the positive sample, but they also participate in the construction of the solution from the very beginning. In particular, they may contribute to obtaining results which are impossible to obtain on the basis of positive information only.

8

1 Restricted optimal uni cation

De nition 1. A hereditary set of substitutions is a recursive set X  

satisfying the condition:

(15) (8 ; 2 )( 2 X ! 2 X ):

Fact 1 For any hereditary set X : 1 id 2 X 2 if ; 2  are variants then 2 X , 2 X . De nition 2. Let T = fT1; : : :; Tng; Ti  Tp, for all 1  i  n, and X  . We say that family T is uni able in accordance with X , if there exists a uni er  of T , such that  2 X . We de ne one of our key notions: restricted optimal uni er (r.o.u.).

De nition 3. Let T = fT1; : : :; Tng; Ti  Tp, for all 1  i  n, and X  . A substitution  is called a restricted optimal uni er of T from X (X -ou) if it ful lls the following conditions:

(16) for all 1  i  n and all a; b 2 Ti, if (a) 6= (b), then there exists no substitution , such that ((a)) = ((b)) and  2 X; (17)  is an mgu of T =ker(); (18)  2 X: The results below establish some basic properties of r.o.u. Although they closely resemble their analogues in [5], we give full proofs for the sake of completeness.

Theorem 1 Let T = fT1; : : :; Tng be a family uni able in accordance with X . Then:

1 each X -ou of T is an mgu of T , 2 if X is hereditary then each mgu of T is an X -ou of T :

9

Proof: Let  be an X -ou of T . We show that  is a uni er of T . Suppose not. Then there exist 1  i  n; a; b 2 Ti, such that (a) = 6 (b). Let  2 X be a uni er of T . Then (a) = (b). Since  is a uni er of T =ker(), then

there exists some , such that  = . Then of course ((a)) = ((b)) and  2 X which contradicts (16). Let T = fT1; : : : ; Tng be a family uni able in accordance with X. Let  be an mgu of T . Then  obviously ful lls (16) and, because of (15), it also ful lls (18). Since T =ker() = T , it also ful lls (17). 2

De nition 4. Let T = fT1; : : :; Tng, Ti  Tp, for i = 1; : : : ; n, and let X

be a hereditary set of substitutions. The following algorithm will be referred to as the X -ou-algorithm: Step 0. Put 0 = id. Step k+1. If there exist i 2 f1; : : : ; ng and a; b 2 Ti such that k (a) 6= k (b) and there exists an mgu k0 of fk (a); k(b)g such that k0 k 2 X then put k+1 = k0 k else terminate. The theorem below states that in case of hereditary sets one can e ectively calculate all restricted optimal uni ers of a given family. The proof may look tedious but it uses the standard technique. Theorem 2 For any input T , the outputs of the X -ou-algorithm are precisely the X -ou's of T (up to variants).

Proof: Let 0; : : :; k be a terminating run of the X -ou-algorithm. Let  = k . Suppose  does not ful ll (16). Then we nd i 2 f1; : : :; ng, a; b 2 Ti and 2  such that (a) 6= (b) but ((a)) = ((b)) and  2 X . Since is a uni er of f(a); (b)g, we can nd an mgu  of the latter set. Then =   for some . Since  =  2 X , we get, by (15),  2 X which contradicts the assumption that our algorithm terminated at step k. Of course  is a uni er of T =ker(). Let  be an arbitrary uni er of T =ker(). By induction on i we show that for every 0  i  k, there exists

a substitution i, such that  = i i. For i = 0, put 0 = . Let i = j + 1. By induction  = j j . Step i is applicable, so we nd 1  l  n, and a; b 2 Tl such that j (a) 6= j (b), and i = i0j where i0 is an mgu of fj (a); j (b)g and i 2 X . Observe that j is a uni er of fj (a); j (b)g. Since  is a uni er of T =ker(), we have (a) = (b), and consequently

10

j (j (a)) = j (j (b)). There exists a substitution i such that j = ii0. Then clearly  = j j = i i, which proves our claim. For i = k we get  = k , which means, that  is an mgu of T =ker(). Of course  2 X . Now we show that each X -ou of T is a variant of some output of the X -oualgorithm. Let  be an X -ou of T . A X -ou-algorithm satisfying condition (a) = (b) for a and b chosen in step k+1 will be referred to as a algorithm. Let 0; : : : ; k be a terminating run of the -algorithm. By induction on i, we show that for every 1  i  k there exists some i such that  = ii. For i = 0, put 0 = . Suppose i = j + 1 and  = j j . Choose 1  l  k, and a; b 2 Tl such that j (a) 6= j (b) and i = i0j where i0 is an mgu of fj (a); j (b)g, i 2 X , and (a) = (b). Then j j (a) = j j (b), which means that j is a uni er of fj (a); j (b)g. Then j = i0 for some . Clearly,  = j j = i0j = i. Our claim holds for i = . In particular (19)  = k for some . We show that, for 1  l  n, and all a; b 2 Tl, the following equivalencere holds: (20) (a) = (b) , k (a) = k (b): Let a; b 2 Tl and (a) = (b). Suppose k (a) 6= k (b). Then fk (a); k(b)g is uni able by and k 2 X . Since X is hereditary, for any mgu  of fk (a); k(b)g we have k 2 X and consequently the -algorithm admits step k+1, which is impossible. So, the implication ) of (20) holds. The converse implication follows from (19). From (20) we derive T =ker() = T =ker(k ). Then k =  for some , since  is an mgu of T =ker(), k being a uni er of it. By (19)  is a variant of k . Is suces to show now that k is an output of the X -ou-algorithm. Suppose not. Then, step k + 1 of the X -ou-algorithm is still applicable following the run 0; : : : ; k . We nd some 1  l  n and a; b 2 Tl such that k (a) 6= k (b) but there exists an mgu of the set fk (a); k (b)g,

k 2 X . By (20) , also (a) 6= (b), but (a) = (b) and  2 X , which contradicts (16). 2

2 Language samples

De nition 5. By a (multiple type) language family on V , we mean an

11 indexed family L = fLtgt2Tp such that Lt  FS(V ) are nite, for all t 2 Tpc , and Lt = ;, for all but nitely many t 2 Tpc. c

For any language family L = fLtgt2T we denote SUB(L) = SUB(

[

t2T

Lt):

De nition 6. By a language sample on V we mean an ordered pair LS = (L+ ; L? ), where positive and negative samples L+ and L? denote language families.

Roughly speaking, a language sample expresses our postulates concerning some categorial grammar. We assume that each L+t consists of structures we want to have the type t, while each L?t contains structures we do not want to have the type t.

De nition 7. Let L = fLtgt2T , Lt  FS(V ) for all t 2 T  Tp. A grammar G is p-compatible (n-compatible) with L i VG = V , sG = S and Lt  CATG (t) (Lt \ CATG (t) = ;) for all t 2 T . We say that a grammar G is compatible with LS = (L+ ; L?) if it is p-compatible with L+ and ncompatible with L?. In general, there may be no grammar at all compatible with a given language sample.

De nition 8. We say that a language sample LS is consistent if there exists at least one grammar compatible with LS . Later on in this section we will formulate an e ective way to verify whether a language sample is consistent or not.

2.1 Positive information

The aim of this section is the generalization of the algorithm introduced in [5] to the multiple type case. L stands here for a positive sample. We start with two auxiliary notions. The following de nition is the formalization of the rule (7):

12

De nition 9. Let L = fLtgt2Tp. We de ne Lk = fLkt gt2Tp; (k  0): L0t = Lt; Lkt +1 = Lkt [ fAi 2 FS(V ) : (9Aj (j 6= i); t1; : : :; tn 2 Tp): (t = (t1; : : : ; tn)i; (A1; : : :; An)i 2 Lkt ; Aj 2 Lkt , for j 6= i)g i

j

If there is l, such that Ll = Ll+1 then the family cl(L) = Lk where k = minfl : Ll = Ll+1g will be called the closure of L.

De nition 10. Let L = fLtgt2Tp and V = fv1; : : : ; vng. We de ne a family L = fLtgt2Tp as follows: For each t 2 Tpc , we obtain Lt from Lt by marking all occurences of atoms from V with di erent symbols (say superscripts). Thus, for each i, we get Vi = fvi1; : : :; vik g | the set of all marked copies of the atom vi. Denote V = Vi [ : : : [ Vn . We choose di erent variables c

i

x1; : : :S; xm, corresponding to the list of all argument structures A1; : : :; Am from t2Tp (SUB(Lt) ? Lt) and set Lx = fAig. It is easy to check that the closure of L exists. We denote cl(L) = fLtgt2Tp i

c

Observe that SUB(L) = St2Tp Lt = SUB(cl(L)). The elements of FS(V ) will be denoted with `overlined' letters A; B; : : :. For any A 2 FS(V ) let  A denote a functor-argument structure on V , obtained from A by replacing copies of atoms with originals.

De nition 11. Let L = fLtgt2Tp . A grammar GF(L) | the general form determined by L is de ned as follows: c

VGF(L) = V; sGF(L) = S; IGF(L)(vi) = ft 2 Tp : Vi \ Lt 6= ;g: According to the above de nition, we can e ectively construct the general form on the basis of the positive information contained in L. This grammar, however, involves at least as many primitive types as the number of di erent occurrences of argument structures in SUB(L). Thus, although GF(L) is p-compatible with L (see below), we regard it as an intermediate stage for further optimalization, rather than the nal solution.

13

De nition 12. For any grammar G and a substitution  we de ne the

grammar ]:

V] = VG ; s] = sG ; I](v) = (IG(v)):

Fact 2 If G = [H ] then (TH (A))  TG(A), for all A 2 FS(VG ). Proof: We prove the fact by structural induction. For A 2 VG our

thesis holds since TG(A) = IG(A) = (IH (A)) = (TH (A)). Suppose A = (A1; : : :; An)i. Let ti 2 (TH (A)). Then there exist t0i 2 TH (A) such that ti = (t0i) and t0j 2 TH (Aj ), j 6= i such that (t01; : : :; t0n)i 2 TH (Ai). By induction we have (t0j ) 2 TG(Aj ) for j 6= i and ((t01; : : :; t0n)i) = ( (t01); : : :; (t0n))i) 2 TG(Ai) which means that (t0i) = ti 2 TG(A). 2

Corollary 1 CATG (t)  CAT [G](t), for each t 2 Tpc.

Corollary 2 Let L = fLtgt2Tp and 2 . Then if a grammar G is pcompatible with L then [G] is also p-compatible with L. c

Lemma 1 Let L = fLtgt2Tp . Then Lt  CATGF(L)(t) for each t 2 Tpc. c

Proof: By structural induction we prove: (21) A 2 Lt ) A 2 CATGF(L)(t), for all t 2 Tp and A 2 SUB(L). If A 2 V then (21) holds by the de nition of IGF(L). Suppose that A = (A1; : : :; An)i and A 2 Lt for some t. By construction of L, we may assume that Lx = fAj g for j = 6 i. By the 6 i and de nition of closure we obtain Aj 2 Lx , for j = Ai 2 L(x ;:::;x ? ;t;x ;:::;x ) : 6 i and By induction we have Aj 2 CATGF(L)(xj ), for j = Ai 2 CATGF(L)((x1; : : :; xi?1; t; xi+1; : : : ; xn)i): j

j

1

i 1

i+1

n i

14 Then xj 2 TGF(L)(Aj ) for j 6= i and (x1; : : : ; xi?1; t; xi+1; : : :; xn)i 2 TGF(L)(Ai ):

By the de nition of terminal type-assignment we obtain t 2 TGF(L)((A1; : : :; An)i) which nishes the proof of (21) and the whole proof, since A 2 Lt implies (9B 2 Lt)(B  = A). 2

Corollary 3 For any L = fLtgt2Tp and 2  [GF(L)] is p-compatible with L. Proof: Immediate consequence of Lemma 1 and Corollary 2. 2 Lemma 2 Let L = fLtgt2Pr . Then CATGF(L)(t)  Lt, for all t 2 Prc . Proof: By structural induction we prove: (22) A 2 CATGF(L)(t) ) (9B 2 Lt)(B = A), for all t 2 Tp and A 2 FS(V ). For atoms (22) holds by the de nition of IGF(L). Suppose A = (A1; : : :; An)i and ai 2 TGF(L)(A). Then (9aj 2 TGF(L)(Aj ); j = 6 i)((a1; : : :; an)i 2 TGF(L)(Ai)). By induction there are Bj 2 La , for j = 6 i, and Bi 2 L(a ;:::;a ) such that B1 = A1; : : : ; Bn = An. Denote B = (B 1; : : : ; Bn )i. Clearly, B  = A. We will show that B 2 La . Since L c

c

j

1

n i

i

includes sets indexed by primitive types only, the contents of L(a ;:::;a ) must have been generated during the construction of the closure of L. Thus there exist B0j 2 La , j 6= i, such that (B 01; : : :; B0i?1 ; Bi; B 0i+1; : : :; B 0n)i 2 La . It follows from the construction of L that aj 2 Var and B j = B0j for j 6= i, which nishes the proof of (22). To nish the proof of this lemma it suces to observe that Lt = Lt for all t 2 Tpc : 2 1

j

n i

i

De nition 13. We write G  H i VG = VH , sG = sH , and IG(v)  IH (v), for all v 2 VG . Fact 3 If G  H then TG(A)  TH (A), for all A 2 FS(VG ). Corollary 4 If G  H then CATG (t)  CATH (t), for all t 2 Tp.

15

Corollary 5 Let L = fLtgt2Tp and G  H . Then: 1 if G is p-compatible with L then H is also p-compatible with L, 2 if H is n-compatible with L then G is also n-compatible with L. The following lemma takes an essential part in further considerations. Lemma 3 Let L = fLtgt2Tp . For any grammar G, the following conditions are equivalent: (i) G is p-compatible with L, (ii) there exists 2  such that [GF(L)]  G. Proof: Suppose G is p-compatible with L. By depth induction we de ne  a mapping h : SUB(L) ?! Tp. We put h(A) = t if A 2 Lt (observe that t 2 TG (A) since G is p-compatible with L). When A = (A1; : : :; An)i and h(A) is already de ned, then, by induction, h(A) 2 TG(A ). Suppose h(A) = ai. There exist aj 2 TG(Aj ); j 6= i such that (a1; : : :; an)i 2 TG(Ai ). We set h(Aj ) = aj for j 6= i, and h(Ai ) = (a1; : : : ; an)i. It follows from the above de nition that: [ (23) h(A) 2 TG(A) for all A 2 Lt; c

t2Tp

(24) if h((A1; : : : ; An)i) = ai and h(Aj ) = aj for all j 6= i, then h(Ai) = (a1; : : :; an)i. A substitution is de ned as follows: (xi ) = h(Ai) where Lx = fAig. By structural induction we prove: (25) if A 2 Lt then h(A) = (t) for all t 2 Tp and A 2 SUB(L). If t 2 Pr or Lt 6= ; then (25) follows from the de nition of h. Suppose now that t = (t1; : : :; tn)i and Ai 2 L(t ;:::;t ) . By De nition 9 there exist Aj 2 Lt , j 6= i such that (A1; : : : ; An )i 2 Lt . By induction we get h((A1; : : :; An)i) = (ti) and h(Aj ) = (tj ) for all j 6= i. By (24) we have h(Ai) = ( (t1); : : :; (tn))i = (t) which nishes the proof of (25). Finally we have I [GF(L)](vi) = (IGF(L)(vi)) = f (t) : vi 2 Ltg = h(Vi )  TG(vi) = IG (vi), which proves (i))(ii). (ii))(i) follows from Corollaries 3 and 5. 2 i

n i

1

j

i

by Def. 12

by (25)

by (23)

by Def. 11

16

2.2 Negative information

In this section we scrutinize the impact of negative postulates. Roughly speaking, their role consists in de ning a hereditary set of substitutions used later in the process of restricted uni cation. Each language family L de nes a class of grammars n-compatible with L. From Corollaries 1 and 5 one easily derives: Fact 4 Let N denote a class of all grammars n-compatible with some language family L. Then: (26) if [G] 2 N then G 2 N ; for all G and 2 ; (27) if G  H and H 2 N then G 2 N ; for all G; H: In order to indicate the general character of the following results, below N stands for an arbitrary class of grammars. For technical reasons we also introduce the analogue of De nitions 6 and 7: De nition 14. By a language postulate on V we mean an ordered pair (L; N ), where L denote language family and N is a class of grammars ful lling conditions (26) and (27). A grammar G is compatible with (L; N ) if G is p-compatible with L and G 2 N . (L; N ) is consistent if there is at least one grammar compatible with (L; N ).

De nition 15. Let G be a grammar and N be a class of grammars. We

de ne the set G;N : (28) G;N = f 2  : [G] 2 Ng

One can easily prove the following: Fact 5 For any grammar G, substitution ; and a class of grammars N ful lling (26) the following hold: (29) [G] 2 N iff 2 G;N (30) G2N iff G;N 6= ; (31) 2  [G];N iff 2 G;N (32) G;N is hereditary.

17

Theorem 3 Let (L; N ) be a language postulate. The following conditions

are equivalent: (i) (L; N ) is consistent, (ii) GF(L) 2 N . Proof: Assume (i). Let G be a grammar compatible with (L; N ). By Lemma 3, since G is p-compatible with L, there is 2  such that [GF(L)]  G. By (27) we have [GF(L)] 2 N and by Fact 5 also GF(L) 2 N (ii))(i) is a consequence of Corollary 3. To simplify the notation L;N will denote the set GF(L);N . Corollary 6 Let (L; N ) be a consistent language postulate and 2 L;N then [GF(L)] is compatible with (L; N ). The above corollary justi es the following de nition: De nition 16. Let (L; N ) be a consistent language postulate. Denote TL = fIGF(L)(vi)gi=1;:::;n. Each F(L)] such that  is L;N -ou of TL will be called an optimal grammar determined by (L; N ). The family of all such grammars will be denoted by G (L; N ): Corollary 7 Each optimal grammar determined by a consistent language postulate (L; N ) is compatible with (L; N ).

2

3 Minimal grammars As we have already proved in the previous section, an optimal grammar determined by a language postulate (L; N ) is compatible with (L; N ). However, the converse of the above statement is not true (GF is here an example), unless we take suciently small grammars. As the measure we choose, after [5], the number of types assigned by the initial type assignment. First we recapitulate some notions from [5]: De nition 17. For grammars G and H we will write: G  H if VG = VH and (8v 2 VG )(card(IG(v))  card(IH (v))); G  H if G  H and H  G; G < H if G < H but not G  H:

18

Fact 6 [G]  G for any grammar G and 2 . Lemma 4 If G 2 G (L; N ) and 2 G;N ) then G  [G]. Proof: Let G = [GF(L)] for some LN -ou . According to Fact 6 it suces to show G  [G]. Suppose not. Then we nd i such that card(I [G](vi)) < card(IG(vi)). By De nition 12 there exist a; b 2 IG(vi) such that a 6= b but (a) = (b), which contradicts (16) since 2 G;N is equivalent to  2 L;N . 2 De nition 18. We say that a grammar G is minimal for a language postulate (L; N ) if G is compatible with (L; N ) but there is no H (H < G) compatible with (L; N ). Moreover, we denote:

Gmin(L; N ) = fG 2 G (L; N ) : (8H < G)(H 62 G (L; N ))g

Fact 7 If a grammar G is minimal for a language postulate (L; N ) and 2 G;N then [G]  G. Lemma 5 If a grammar G is minimal for (L; N ) then there exists a substitution 2 L;N such that G = [GF(L)]. Proof: By Lemma 3 we nd such that [GF(L)]  G. Clearly [GF(L)] is compatible with (L; N ) (compare the proof of Theorem 3) so it must equal

G, since G is minimal. The property of a substitution follows from Fact 5.

2

Theorem 4 For any consistent language postulate (L; N ) and a grammar G the following conditions are equivalent: (i) G is minimal for (L; N ), (ii) there is a grammar H 2 Gmin(L; N ) and a substitution 2 H;N , such that G = [H ].

19

Proof: (i))(ii). Assume (i). By Lemma 5, G = F(L)] for some  2 L;N . Let  be an mgu of TL= ker(). Then there exists such that  = . We set H = [GF(L)]. We will show that  is L;N -ou of TL . It follows from

Fact 5 and (4) that  ful lls (18) and (17), so it suces to show (16). Suppose  does not ful ll (16). Then, by (3), there exist i and a; b 2 IGF(L)(vi) such that (a) 6= (b) but there is such that (a) = (b) and  2 L;N . From (4) we derive H  G. Then of course [H ] is compatible with (L; N ) and [H ] < H  G which contradicts the minimality of G. Suppose H  < H for some H  2 G (L; N ). Then also H  < G which contradicts (i). (ii))(i). Assume (ii). According to Lemma 4 it suces to show that H is minimal for (L; N ). If not then there is a grammar G compatible with (L; N ) such that G < H . We may assume that G is minimal for (L; N ). By the rst part of the proof there exist H  2 Gmin(L; N ) and a substitution  2 (H ;N ) such that G = [H ]. By Lemma 4 we have H   G and consequently H  < H which is impossible since H 2 Gmin(L; N ). 2 According to Fact 4 all the above results are valid if we substitute: `(L+; L? )' for `(L; N )' `language sample' for `language postulate' ? `is n-compatible with L ' for `2 N ' `LS ' for `L;N ' From Lemma 2, Fact 4 and Theorem 3 one easily derives: Corollary 8 Let LS = (L+ ; L?) be a language sample satisfying the condi+ ? tion: Lt = Lt = ; for all t 62 Prc . Then LS is consistent if and only if L+t \ L?t = ; for all t 2 Prc.

4 Illustration We start this section with the illustration of the notions introduced so far. Let us reconsider the example represented by (9) and (14). Our positive sample is L+ = fL+S g, where: 8 (John; (likes; Mary)1)2; > > < only; John)1; (likes; Mary)1)2; L+S = > (( > : (John; ((only; likes)1 ; Mary)1 )2 ;

(John; (likes; (only; Mary)1)1)2

9 > > = : > > ;

20 Similarly, L? = fL?S g, where:

L? = S

(

)

(John; ((only; John)1; Mary)1)2; : (John; ((only; Mary)1; Mary)1)2

According to Corollary 8, since L+S \ L?S = ;, the language sample LS = (L+ ; L?) is consistent. Now we construct GF(L+ ). First we mark all occurrences of atoms with superscripts: 8 (John1 ; (likes1 ; Mary1 ) ) ; > 12 > + < ((only1 ; John2 )1; (likes2 ; Mary2 )1 )2; LS = > (John3; ((only2; likes3) ; Mary3) ) ; 1 1 2 > : 4 4 3 4

(John ; (likes ; (only ; Mary )1)1)2

Now we add new sets:

L+x L+x L+x L+x L+x L+x L+x L+x L+x L+x L+x

1 2 3 4 5 6 7 8 9 10 11

= = = = = = = = = = =

fJohn1g fMary1g f(only1; John2)1g fJohn2g fMary2g fJohn3g flikes3g fMary3g fJohn4g f(only3; Mary4)1g fMary4g

9 > > = : > > ;

21 After calculation of cl(L+) we have: L+x n S = f(likes1; Mary1)1g L+(x n S)=x = flikes1g L+x n S = f(likes2; Mary2)1g + = fonly1g Lx =x + = flikes2g L(x n S)=x = f((only2; likes3)1; Mary3)1g L+x n S = f(only2; likes3)1g L+(x n S)=x L+((x n S)=x )=x = fonly2g L+x n S = f(likes4; (only3; Mary4)1)1g + L(x n S)=x = flikes4g L+x =x = fonly3g It is easy to see that the initial type assignment of GF(L+ ) is precisely that represented by (10). Now we focus on the role of the negative sample. Let A and B denote the rst and the second structure from L?S , respectively. Denote = dx4 : x7e. We will show that 62 LS . The initial type assignment of [GF(L+ )] has the following form: John ?! x1; x7; x6; x9; ?! x2; x5; x8; x11; (33) Mary likes ?! (x1n S)=x2; (x3n S)=x5; x7; (x9n S)=x10; only ?! x3=x7; ((x6n S)=x8)=x7; x10=x11: It is clear that T [GF(L )](A) = fSg and A 2 LS, which means that [GF(L+ )] is not n-compatible with L? . Now let = dx3 : (x6n S)=x8e. Then of course is an mgu of the set fx3=x4; ((x6n S)=x8)=x7g. Since LS is hereditary, we have 62 LS . In a similar way one can check that no uni er of fx10=x11; ((x6n S)=x8)=x7g is a member of LS . According to what we have just shown, the restricted uni cation algorithm does not admit any substitution unifying IGF(L )(only). It is easy to see that (11) is the only output of this algorithm. The above example concerns the situation in which the negative sample contributes to the reduction of the number of outputs. In this case the set of outputs generated by the full sample is a subset of the set of outputs genetrated by the positive sample only. One might argue that instead of 1

1

2

3 3

4

3

5

5

6

8

6

8

7

9

9

10

10

11

+

+

22 formulating negative postulates it is enough to verify the outputs based on the positive ones. As shown below this is not always the case. The next example shows that the negative sample can sometimes cause the increase in the number of outputs. Consider the following positive sample: (Mary; (likes; John)1)2 John; (knows; Mary)1)2 (34) ((John ; (likes; Susan)1)2 (Mary; (knows; (John; (likes; Susan)1)2)1)2

?! ?! ?! ?!

S; S; S; S:

We generate the general form rst:

Mary John (35) Susan likes knows

?! ?! ?! ?! ?!

x 1 ; x4 ; x7 ; x 2 ; x3 ; x5 ; x9 ; x6; x10; (x1n S)=x2; (x5n S)=x6; (x9nx8)=x10; (x3n S)=x4; (x7n S)=x8:

Since the above family is uni able we get only one output:

Mary John (36) Susan likes knows

?! ?! ?! ?! ?!

S; S; S; (S n S)= S; (S n S)= S;

which is nonsense from the linguistic point of view. Let us now enrich our sample with one negative postulate: (37) Mary 6?! S This leads to the following outputs:

Mary John (38) Susan likes knows

?! ?! ?! ?! ?!

x1 ; x1 ; x1 ; (x1n S)=x1; (x1nx1)=x1; (x1n S)=x1;

23

Mary John (39) Susan likes knows

?! ?! ?! ?! ?!

x1 ; x1 ; x1 ; (x1n S)=x1; (x1n S)=x1; (x1n S)= S :

Again, interpreting x1 as the type of proper noun (PN), we can accept the output (39). However, (38) under the same interpretation assigns the type PN to the structure (John; (likes; Susan)1)2, which is not what we can agree with. Unfortunately, there is no way of eliminating (38) by formulating new negative postulates. Consider the grammar: Mary ?! x1; x4; x3; John ?! x2; x3; x5; x9; (40) Susan ?! x6; x10; likes ?! (x1n S)=x2; (x5n S)=x6; (x9nx4)=x10; knows ?! (x3n S)=x4: Let G1 and G2 denote (35) and (40) respectively. Then G2 = [G1], where = dx7 : x3; x8 : x4e. It is easy to see that for all A 2 FS(V ) we have TG (A) \ Tpc = TG (A) \ Tpc. Thus G2 does not generate any `wrong' sentence we could add to a negative sample. Buszkowski and Penn in [5] brie y sketched the uni cation algorithm sensitive to negative information. They considered constraints of the form x 6= y where x; y 2 Tp, allowing only those substitutions  for which (x) 6= (y). For instance, a constraint x4 6= x8 applied to our last example would succesfully eliminate the output (38). However such a constraint does not have any linguistic meaning of its own and can only be formulated after the general form is already known. Our aim was to work out the system in which all postulates carry linguistic information and are formulated prior to making all the calculations. We started with a very general notions of a language sample and compatibility on account of their connection with language learning problems (cf. [1]). Below we sketch one possible variation of the system under consideration: De nition 19. Let L = fLtgt2T , Lt  FS(V ) for all t 2 T  Tpc . A grammar G is strongly n-compatible with L i VG = V , sG = S and CATG (TG(Lt)) \ CATG(t) = ; for all t 2 T . We say that a grammar G is 1

2

24 strongly compatible with LS = (L+ ; L? ) if it is p-compatible with L+ and strongly n-compatible with L? .

According to the above de nition, a postulate A 6?! t establishes a constraint not only for a structure A, but also for all structures interchangable in Ajdukiewicz's sense with structure A. It is clear that only (39) is strongly compatible with ((34),(37)). The reader is asked to prove that a class of all grammars strongly n-compatible with some language family ful lls (26) and (27). Let us now illustrate the connection between the complexity of the language sample and the (number of) grammars it generates. We have already illustrated the in uence of the negative sample, so let us now mention another example concerning the positive sample. The extension of (34) with the following: (41) Mary ?! PN results in the increase in the number of outputs. Below we list four of the possibilities:

Mary John (42) Susan likes knows

?! ?! ?! ?! ?!

PN; PN; PN; (PN n S)= PN; (PN n PN)= PN; (PN n S)= PN;

Mary John (43) Susan likes knows

?! ?! ?! ?! ?!

PN; S PN; PN; (PN n S)= PN; (PN n S)= S;

Mary John (44) Susan likes knows

?! ?! ?! ?! ?!

PN; S; S; S; (S n S)= S; (S n S)= S;

25

Mary John (45) Susan likes knows

?! ?! ?! ?! ?!

PN; PN; PN; (PN n S)= PN; (PN n S)= PN; (PN n S)= S :

It is easy to see that (42) and (45) are analogues of (38) and (39), respectively. Clearly, the above considerations do not exhaust the subject. The aim of this paper was to provide a tool for generating grammars on the ground of both positive and negative data. We generalized Buszkowski's notion of optimal uni cation, preserving its basic properties. We hope that the possibility of formulating, in a strict way, various postulates concerning grammar construction contributes to the limitation of such vague notions as linguistic plausibility.

References [1] P. W. ADRIAANS, Language Learning from Categorial Perspective, Universiteit van Amsterdam, Amsterdam, 1992. [2] J. van Benthem, Language in Action, Categories, Lambdas and Dynamic Logic, North-Holland, Amsterdam, 1991. [3] W. Buszkowski, Solvable Problems for Classical Categorial Grammars, Bull. Pol. Acad. Scie. Math. 35 (1987), pp. 373{382. [4] W. Buszkowski, Discovery Procedures for Categorial Grammars, in [7]. [5] W. Buszkowski and G. Penn, Categorial Grammars Determined from Linguistic Data by Uni cation, Studia Logica XLIX, 4 (1990) pp. 431{454. [6] M. Kanazawa, Identi cation in the Limit of Categorial Grammars, manuscript. [7] E. Klein and J. van Benthem (eds), Categories, Polymorphism an Uni cation, Universiteit van Amsterdam, Amsterdam, 1987.

26 [8] J. W. Lloyd, Foundations of Logic Programming, Sprringer-Verlag, Berlin, 1987.