Remarks on Generalized Post Correspondence Problem T. HARJU
Department of Mathematics, University of Turku,
FIN-20500 Turku, Finland
J. KARHUMA KI
Department of Mathematics, University of Turku,
FIN-20500 Turku, Finland
D. KROB
Institute Blaise Pascal (LITP), CNRS, Universite Paris VII
2, place Jussieu, F-75252 Paris France
Turku Centre for Computer Science TUCS Technical Report No 73 April 1996 ISBN 951-650-913-4 ISSN 1239-1891
Abstract It is shown that Post Correspondence Problem remains undecidable even in the case where one of the morphisms is xed. Accordingly the generalized PCP is undecidable even in the case where both of the morphisms are xed, and, moreover, the cardinality of their domain alphabet is 7. In particular, GPCP(7) is undecidable. On the other hand, GPCP(2) is not only decidable, but, as we show here, its all solutions can be eectively found.
y
Research supported by the Academy of Finland, project 4077
1 Introduction Post Correspondence Problem, PCP for short, is one of the oldest simply formulated combinatorial problems that is algorithmically undecidable. The undecidability of PCP was proved by E. Post in his seminal paper [12] of 1946. Since then, and in particular during the last two decades, there has appeared a number of re nements of this important result. In modern terms PCP as a decision problem asks whether for two given morphisms h; g : ! over nitely generated word monoids and there exists a nonempty word w 2 + = n f1g such that h and g agree on w, h(w) = g(w). In other words, PCP asks for two morphisms h; g whether the set of solutions E (h; g) = fw 2 j h(w) = g(w)g is nontrivial, i.e., whether E (h; g) 6= f1g. Here the set E (h; g) is called the equality set of the morphisms h; g. PCP has oered many intriguing questions during its lifetime. One such problem was revealed when it was observed that even in the binary case, where contains just two generators (letters), the problem is rather complicated. However, as was expected, PCP turned out to be decidable in this case, see [4] and [11]. Another evidence of the challenging nature of PCP was revealed when it was proved that although in certain classes of instances (h; g) of morphisms the sets of solutions E (h; g) are regular, still the nontriviality of E (h; g) remains as an undecidable property. An example of such a case is the class of pre x codes, or more generally codes of bounded delay in the same direction, see [1] and [14]. Of course, in these cases the equality set of two morphisms cannot be found eectively (although it is known to be a regular set). For yet another surprising feature of PCP we return to the binary case of the problem. As already mentioned PCP is decidable in this case and the known proofs of this are rather complicated { demanding some 15 pages. However, the proof of decidability does not give any easy way to construct the equality sets eectively. There is a short proof, see [5], which shows that in the binary case the equality sets E (h; g) are very simple sets: Either E (h; g) consists of all words containing the two letters in a xed ratio, or E (h; g) is a regular set of the form fu; vg or (uwv) for some words u; v and w. No algorithm for nding eectively these regular sets has been published. In this paper we ll in the above gap. Namely, as a consequence of a more general result, we conclude that in the binary case the equality set of two morphisms can be eectively found. The proof of this result uses a generalization of PCP. In this generalized Post Correspondence Problem (or GPCP
1
for short) the instances of the problem are of the form (h; g; u1; u2; v1; v2), where h; g : ! are morphisms and u1; u2; v1; v2 are words in . A solution of such an instance is a word w for which u1h(w)u2 = v1g(w)v2. It was shown in [4] that GPCP is decidable in the binary case (from which it follows as a special case that PCP is decidable in the binary case). On the other hand, by [2] and [10], if the morphisms are de ned on an alphabet of nine letters, then PCP is again undecidable. Using the construction of [2] we show that GPCP is undecidable if has seven letters, and, in fact, it remains undecidable even for instances (h; g; u; 1; 1; v), where both of the morphisms h and g are xed. Moreover, when the cardinality of the alphabet is increased (to 21) we nd two speci c morphisms h and g such that it is undecidable whether for a given word v there exists a word w so that vh(w) = g(w). Finally, the ordinary PCP remains undecidable if one of the morphisms is xed.
2 Preliminaries For a nite set of letters let be the free monoid and + the free semigroup generated by . We denote by 1 the empty word of , i.e., 1 is the identity of the monoid , and + = n f1g. A word u 2 is a pre x (sux, resp.) of a word v 2 , if v = uw (v = wu, resp.) for some w 2 . In this case we write u = vw?1 (u = w?1 v, respectively). We consider here morphisms h : ! . Such a morphism is called binary, if consists of two letters, i.e., jj = 2. A pair (h; g) of morphisms h; g : ! is called an instance of PCP on . The size of an instance (h; g) is de ned to be the cardinality jj of the alphabet . We say that a nonempty word w 2 is a (nontrivial) solution of the instance (h; g), if h(w) = g(w). (The empty word is the trivial solution of (h; g)). PCP asks for a given instance whether it has a solution or not. We denote by PCP(n) the subproblem of PCP restricted to instances of size at most n. An equality set E (h; g) of two morphisms h; g : ! consists of the solutions, including the trivial solution, of the instance (h; g),
E (h; g) = fw 2 j h(w) = g(w)g: It is straightforward to prove that the equality set E (h; g) is always a (possibly in nitely generated) free monoid, and, in fact, its base
e(h; g) = (E (h; g) n f1g) n (E (h; g) n f1g)2 2
is a pre x code. The elements of e(h; g) will be called minimal solutions of the instance (h; g). In the generalized Post Correspondence Problem, or GPCP for short, the instances are of the form (h; g; u1; u2; v1; v2), where h; g : ! are morphisms and u1; u2; v1; v2 are words over the range alphabet . A solution of such an instance is a word w for which u1h(w)u2 = v1g(w)v2. The set of solutions of an instance (h; g; u1; u2; v1; v2) of GPCP is the generalized equality set
E (h; g; u1; u2; v1; v2) = fw 2 j u1h(w)u2 = v1g(w)v2g: Let GPCP(n) denote the problem: Does a given instance of GPCP of size n have a solution? Here n is again the size of the domain alphabet. Notice that if an instance (h; g; u1; u2; v1; v2) of GPCP has a solution, then one of the words u1 or v1 (u2 or v2, resp.) is a pre x (sux, resp.) of the other. Therefore, it can be assumed that u1 = 1 or v1 = 1 and u2 = 1 or v2 = 1, since is cancellative.
3 The word problem for semi-Thue systems The word problem for semigroups was the rst undecidability result in algebra. That there are (presentations of) semigroups with an undecidable word problem was proved independently by Post [13] and Markov [8]. The rst concrete example of a nitely presented semigroup with an undecidable word problem was given by Markov in 1947. This example has 13 generators and 33 relators. This was later improved by Tzeitin [15], who proved that the semigroup S0 = ha; b; a; b; e j R0i with ve generators and seven relators
aa = aa eaa = ae
ab = ba ebb = be
ba = ab bb = bb a aa = a aae
has an undecidable word problem. Matijacevic [9] modi ed the presentation of S0 and obtained a semigroup
S1 = ha; b j u1 = u2; u1 = u3; v = wi with only two generators and three relators such that S1 has an undecidable word problem. In the presentation of this semigroup one of the relators has more than 900 occurrences of generators. Notice that in S1 two of the relators have a common word u1; this will be useful in our later considerations. The following was proved by Matijacevic [9]. 3
Theorem 3.1. The semigroup S1 has an undecidable word problem. In the individual word problem for a presentation of a semigroup S we are given a xed word w0, and we ask for words w whether w = w0 in S . Tzeitin [15] constructed from S0 a presentation of a semigroup S2 = ha; b; c; d; e j R2i with an undecidable individual word problem. The semigroup S2 has ve generators a; b; c; d; e and nine relators: ac = ca ad = da bc = cb bd = db eca = ce edb = de cdca = cdcae caaa = aaa daaa = aaa For S2 we have the following theorem due to Tzeitin [15]. Theorem 3.2. The individual word problem for w = aaa is undecidable in S2. From the undecidability of the word problem for semigroups it follows that the word problem for semi-Thue systems (see [7]) is also undecidable, see e.g. [2]. Indeed, for each relator u = v of a semigroup S = h j Ri the corresponding semi-Thue system T = (; R0) has the rules u ! v and v ! u. As observed by Pansiot [10] the Matijacevic semigroup S1 can be represented as a semi-Thue system T1 with only ve rules in a 2-letter alphabet: u1 ! u2, u2 ! u3, u3 ! u1, v ! w and w ! v. This means that u = v in S1 if and only if u ! v in T1. Hence we have, see Pansiot [10], Theorem 3.3. There is a semi-Thue system T = (; R) with jj = 2 and jRj = 5 such that T has an undecidable word problem.
4
4 Restrictive cases of generalized PCP We shall now give a construction due to Claus [2] which reduces Post Correspondence Problem to the word problem for semi-Thue systems. Let T = (; R) be an arbitrary semi-Thue system with = fa1; a2; : : :; ang and R = fr1; r2; : : : ; rmg, where ri denotes a rule ui ! vi. We shall consider R also as an alphabet. Clearly, w1 ! w2 holds in T if and only if a0w1a0 ! a0w2a0 holds in the semi-Thue system T0 = ( [ fa0g; R), where a0 is a new symbol. We code ( [ fa0g) into the binary alphabet fa; bg by the injective morphism
(ai) = abi+1 (i = 0; 1; : : : ; n): Now, as is easy to see, a0w1a0 ! a0w2a0 holds in T0 if and only if (a0w1a0) ! (a0w2a0) holds in the semi-Thue system
T0 = (fa; bg; f (ui) ! (vi) j i = 1; 2; : : : ; mg): Next de ne the morphisms ; : fa; bg ! (fa; bg [ fdg) by (x) = dx and (x) = xd for both x = a; b. Finally for each u; v 2 de ne the morphisms hu; gv : (R[fa; b; e; dg) ! fa; b; dg as follows: hu (x) = (x) hu (ri) = (vi) hu (e) = (a0) (ua0) hu (d) = d (a0)
gv (x) = (x) for x 2 fa; bg gv (ri) = (ui) for i = 1; 2; : : : ; m gv (e) = (a0)d gv (d) = (a0v) (a0)
It is rather straightforward, see [2], to show that the instance (hu ; gv ) has a solution if and only if u ! v holds in T . Moreover, each minimal solution of (hu ; gv ) is necessarily of the form ewd for some word w 2 (fa; bg [ R), which does not contain either of the letters d and e. Hence we have derived the following theorem. Theorem 4.1. For each semi-Thue system T = (; R) and words u; v 2 , there eectively exist two morphisms hu ; gv : ! such that jj = jRj +4 and u ! v holds in T if and only if E (hu ; gv ) 6= f1g. In particular, when Theorem 4.1 is applied to the the semi-Thue system of Theorem 3.3 we obtain, see [2] and [10] that Theorem 4.2. PCP(9) is undecidable. The above constructions yield also the following theorem. 5
Theorem 4.3. There are two morphisms h; g de ned on an alphabet with jj = 7 such that it is undecidable for words w1; w2 2 whether or not
there exists a word w for which w1h(w) = g (w)w2. In particular, GPCP(7) is undecidable for instances (h; g; w1; 1; 1; w2 ). Proof. Consider the semi-Thue system T of Theorem 3.3, and the morphisms hu; gv obtained in the constructions of Theorem 4.1. Let = fa; b; r1; : : : ; r5; d; eg, and = fa; b; r1; : : : ; r5g. Hence jj = 7. We observe that hu(x) and gv (x) depend only on T for all x 2 , i.e., for any words u1 and u2, hu1 = = hu2 =, and for any words v1 and v2, gv1 = = gv2 =. Let h; g : ! be de ned by h = hu = and g = gv =. As noticed above the minimal solutions of the instances (hu; gv ) are of the form ewd, where w 2 , i.e., w does not contain the letters d; e. It follows that it is undecidable for words u; v 2 whether or not there exists a word w 2 such that hu (e)h(w)hu(d) = gv (e)g(w)gv (d). Furthermore, here gv (e) is a pre x of hu(e) and hu (d) is a sux of gv (d). Consequently, hu(e)h(w)hu (d) = gv (e)g(w)gv (d) just in case gv (e)?1h(e)h(w) = g(w)gv (d)hu (d)?1 . The claim follows from this, when we let w1 vary over the words gv (e)?1h(e) and w2 vary over the words gv (d)hu (d)?1 . The same proof when applied to the semigroup S2 with an undecidable individual word problem gives the following improvement of Theorem 4.3 (at the cost of increasing the cardinality of the alphabet of the two morphisms). Theorem 4.4. (1) There exists a morphism g such that it is undecidable for morphisms h whether there exists a word w such that h(w) = g (w). (2) There exist two morphisms h and g such that it is undecidable for a word v whether vh(w) = g (w) for some word w. Proof. The semigroup S2 = ha; b; c; d; e j ui = vi (i = 1; 2; : : : ; 9)i can be represented as a semi-Thue system T2 in the alphabet fa; b; r1; : : :; r9; s1; : : : ; s9; d; eg with the rules ri = ui ! vi; si = vi ! ui for i = 1; 2; : : : ; 9. Let g = gv , where the morphism gv is obtained from the constructions of Theorem 4.1 for the word v = aaa, and let h be the morphism hu restricted to R [ fa; b; dg. Now, u ! aaa in T2 if and only if there exists a word w such that hu(e)h(wd) = g(ewd). Case (1) of the claim follows from this. Case (2) follows when we observe that hu (e)h(wd) = g(ewd) if and only if g(e)?1hu(e)h(wd) = g(wd). Symmetrically one can conclude that for the same morphisms h and g it is undecidable for a word v whether there exists a word w such that h(w)v = g(w). One should notice that in these results the cardinality of the domain alphabet of h and g is already 22 in Case (1) and 21 in Case (2) of Theorem 4.4.
6
5 An eective construction of regular equality sets In this section we shall consider the problem of constructing a nite automaton for a regular equality set E (h; g). The equality set E (h; g) is known to be regular for instances (h; g), where h and g are bounded delay codes, see [1]. In particular, E (h; g) is regular whenever h and g are pre x codes. On the other hand, it was shown by Ruohonen [14] that PCP is undecidable already for (bi)pre x codes. We say that a pair (h; g) satis es the solvability condition, if there exists an algorithm which decides for a given u 2 whether there exists a nonempty word w 2 such that uh(w) = g(w), i.e., if GPCP is decidable for the instances (h; g; u; 1; 1; 1) with u 2 . For each word u 2 let
E (h; g; u) = fw j uh(w) = g(w)g be the generalized equality set of the triple (h; g; u). A family H satis es the solvability condition, if for all h; g 2 H, (h; g) satis es the condition. Let E = E (h; g) for two morphisms h; g : ! . At the moment we do not suppose that E (h; g) is regular nor do we suppose the solvability condition. We construct for (h; g) an (in nite state) automaton A(h; g) that accepts the equality set E (h; g). In order to do this consider the set u?1E = fw j uw 2 E g with w 2 . Clearly,
u?1E = fw j h(u)h(w) = g(u)g(w)g; from which we deduce that
u?1E
(
?1 h(u)); if g (u) is a pre x of h(u) , E ( g; h ; g ( u ) = E (h; g; h(u)?1g(u)); if h(u) is a pre x of g(u) .
Therefore we obtain
Lemma 5.1. u?1E (h; g) is a generalized equality set for all h; g and u. In particular, the solvability condition implies that one can decide whether 6= ;. Note also that by Nerode's theorem, see [6], E is regular if and only if the family fu?1E j u 2 g is nite. Let now Q = fu?1E j u 2 g;
u?1E
7
and de ne an automaton A(h; g) = (Q; ; ; E; E ) with the set E as its unique initial and nal state, and
(u?1E; a) = (ua)?1E
for a 2 ; u 2 :
The above construction implies immediately that A(h; g) accepts E : Theorem 5.2. L(A(h; g)) = E (h; g). We need also Lemma 5.3. For all words u; v 2 either u?1E = v?1E or u?1E \ v?1E = ;. Proof. Indeed, if w 2 u?1 E \ v ?1 E , then h(u)h(w) = g (u)g (w) and h(v)h(w) = g(v)g(w). If here h(u) = g(u)z for a word z 2 , then h(w) = zg(w), which implies that also h(v) = g(v)z. From this it follows that u?1E = v?1E . Symmetrically, the same conclusion follows if h(u) is a pre x of g(u), i.e., g(u) = h(u)z. Notice that for all words u either u?1E is in nite or empty, or u?1E = f1g (in the case when u = 1 and E = f1g). Finally, we observe that w 2 u?1E if and only if uw 2 E , and hence it is decidable for two words u; w 2 whether or not w 2 u?1E , and therefore, by Lemma 5.3, we have the next result. Lemma 5.4. Suppose u?1E 6= ; for a word u 2 . It is decidable for words v 2 whether or not u?1E = v?1E . Proof. By extensive search we can nd a word w 2 such that w 2 u?1 E , because, by assumption, u?1E 6= ;. Now, w 2 v?1E if and only if u?1E = v?1E by Lemma 5.3. We are now ready for the main theorem of this section. Theorem 5.5. Let H be a family of morphisms that satis es the solvability condition, and let h; g 2 H. If E (h; g) is regular then E (h; g) can be eectively found. Proof. Let h; g : ! be two morphisms in H, and assume that E (h; g) is regular. Consider the automaton A = A(h; g) as de ned above. By Nerode's theorem, E (h; g) is regular if and only if A(h; g) is a nite automaton. Using the solvability condition for the instance (h; g) we can check eectively whether the Nerode condition holds: 8
(1) Check whether E 6= f1g. If E = f1g, then output the nite automaton having only the intial state and no transitions. Suppose then that E 6= f1g. (2) For n 0 suppose we have already found all nonempty states u?1E with juj = n. Let the set of these be Qn. Set Qn+1 = Qn . Check for each a 2 and u 2 Qn whether (ua)?1E 6= ;. This is eective by our assumption. If the answer is positive, then check by Lemma 5.4 whether (ua)?1E 2= Qn (eectively by Lemma 5.4). If so, then add (ua)?1E to Qn+1. (3) When the rst n for which Qn+1 = Qn is reached then output the subautomaton of A(h; g) having the state set Qn. Case (3) must be eventually reached, because by assumption E is regular, and hence A(h; g) does have a nite subautomaton which accepts E .
Corollary 5.6. Let H be a family of morphisms for which GPCP is decidable, and let h; g 2 H. If E (h; g) is regular then E (h; g) can be eectively found.
Since GPCP(2) is decidable, see [4], and the equality sets E (h; g) are regular in the binary case, when either one of the morphisms h or g is nonperiodic, see [5], we have immediately the following corollary.
Corollary 5.7. In the binary case the equality set of two morphisms can be
eectively found.
Secondly we observe that the proof of Theorem 5.5 extends immediately to the generalized equality sets, when regularity is demanded on the generalized equality set, and the solvability condition is changed to strong solvability condition: Given morphisms h; g : ! in a family H, and four words u1; u2; v1; v2 it is decidable whether E (h; g; u1; u2; v1; v2) contains a nonempty word, i.e., GPCP is decidable for H.
Theorem 5.8. Let H be a family of morphisms for which GPCP is decidable. If E (h; g; u1 ; u2; v1; v2) is regular, then it can be eectively found.
As above, also Theorem 5.8 has an interesting consequence.
Corollary 5.9. In the binary case the generalized equality set of two morphisms can be eectively found.
9
Proof. We need here the fact that for two periodic morphisms the corollary holds, as well as, that in other cases the generalized equality set is always regular. Both of these facts can be easily proved as in the case of ordinary equality sets, see [4]. Finally, we notice that, as in the proof of decidability of PCP(2) in [4], the use of generalized instance is necessary in order to obtain results for nongeneralized instances. Indeed, from the assumptions (1) E (h; g) is regular, and (2) it is decidable whether E (h; g) 6= f1g, we cannot conclude Theorem 5.5. To see this concretely let h; g : ! be any two pre x codes, and de ne h0; g0 : ( [ fdg) ! ( [ fdg), with d 2= , by
h0(d) = d = g0(d)
and
h0(x) = h(x); g0(x) = g(x) (x 6= d):
Then we have
E (h0; g0) = (E (h; g) [ fdg); and hence E (h0; g0) is regular, since E (h; g) is regular, see [5]. However, E (h0; g0) 6= f1g by de nition, and hence if E (h; g) were eectively computable, we would be able to decide whether E (h; g) 6= f1g, which would contradict the undecidability of PCP for pre x codes, [14].
10
References [1] C. Chorut and J. Karhumaki, Test sets for morphisms with bounded delay, Discrete Appl. Math. 12 (1985), 93 { 101. [2] V. Claus, Some remarks on PCP(k) and related problems, Bull. EATCS 12 (1980), 54 { 61. [3] K. Culik II and J. Karhumaki, On the equality sets for homomorphisms on free monoids with two generators, RAIRO Theoret. Informatics 14 (1980), 349 { 369. [4] A. Ehrenfeucht, J. Karhumaki and G. Rozenberg, The (generalized) Post Correspondence Problem with lists consisting of two words is decidable, Theoret. Comput. Sci. 21 (1982), 119 { 144. [5] A. Ehrenfeucht, J. Karhumaki and G. Rozenberg, On binary equality languages and a solution to the test set conjecture in the binary case, J. Algebra 85 (1983), 76 { 85. [6] S. Eilenberg, Automata, Languages, and Machines, Vol. A, Academic Press, New York, 1974. [7] M. Jantzen, \Con uent String Rewriting", Springer-Verlag, 1988. [8] A.A. Markov, On the impossibility of certain algorithms in the theory of associative systems, Dokl. Acad. Nauk. 55 (1947), 353 -356 (Russian). [9] J. Matijacevic, Simple examples of usolvable associative calculi, Dokl. Akad. Nauk 173 (1967), 1264 { 1266 (Russian). [10] J.J. Pansiot, A note on Post's Correspondence Problem, Inform. proc. Lett. 12 (1981), 233. [11] V.A. Pavlenko, Post combinatorial problem with two pairs of words, Dokl. Akad. Nauk. Ukr. SSR (1981), 9 { 11. [12] E. Post, A variant of a recursively unsolvable problem, Bulletin of Amer. Math. Soc. 52 (1946), 264 { 268. [13] E. Post, Recursive unsolvability of a problem of Thue, J. Symb. Logic 12 (1947), 1 { 11. [14] K. Ruohonen, Reversible machines and Post's correspondence problem for bipre x morphisms, J. Inform. Process. Cybernet. EIK 21 (1985), 579 { 595. [15] G.C. Tzeitin, Associative calculus with an unsolvable equivalence problem, Tr. Mat. Inst. Akad. Nauk 52 (1958), 172 { 189 (Russian).
11
Turku Centre for Computer Science Lemminkaisenkatu 14 FIN-20520 Turku Finland http://www.tucs.abo.
University of Turku Department of Mathematical Sciences
Abo Akademi University Department of Computer Science Institute for Advanced Management Systems Research
Turku School of Economics and Business Administration Institute of Information Systems Science