Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Look-ahead finite-memory automata
Daniel Zeitlin
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Look-ahead finite-memory automata
Research Thesis
Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science
Daniel Zeitlin
Submitted to the Senate of the Technion - Israel Institute of Technology Tamuz, 5766
Haifa
July, 2006
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
The research was done under the supervision of Assoc. Prof. Michael Kaminski in the Department of Computer Science. I would like to thank Michael for the way he guided me, always knew when to give me freedom and peace of mind to work, when do I need instructions and help in finding a rope, and when to put me under the pressure of schedule so that I will stop wasting time and start doing something. Thanks to my family, my parents, also known as father and mother, that since I and my twin brother Eli were 12 always wanted us to study Mathematics and Computer Science, pushed us to study and made us realize the importance of higher degree. I would like to thank my best friend Raviv Br¨ ueller for “resonating” with me, and thereby contributing in various and sundry ways to this thesis. Finally, I would like to give special thanks to my beautiful and lovely fianc´ee, Elena. Your patience, loving support, and encouraging words got me through many long hours of writing this thesis. I love you.
The generous financial help of the Technion is gratefully acknowledged.
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Contents Abstract
1
Notation and Abbreviations
3
1 Introduction 1.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . 1.2 Previous works . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Finite-Memory Automaton (FMA) . . . . . . . . 1.2.2 Two-Way Finite-Memory Automaton (2WFMA) . 1.2.3 Context-free grammars over infinite alphabets . .
. . . . .
4 . 5 . 6 . 6 . 9 . 10
. . . . .
. . . . .
. . . . .
12 12 12 13 15 20
3 Regular expressions for LQR languages (LQRE) 3.1 The definition of LQRE . . . . . . . . . . . . . . . . . . . 3.2 Languages defined by instances of LQRE . . . . . . . . . . 3.3 Equivalence of the languages defined by LQRE and LFMA 3.4 Languages recursively defined by LQRE . . . . . . . . . . 3.5 Equivalence of the definitions . . . . . . . . . . . . . . . .
. . . . .
. . . . .
36 36 36 38 43 45
2 Look-ahead Finite-Memory Automaton (LFMA) 2.1 Guessing instead of reassignment . . . . . . . . . . 2.2 The definition of LFMA . . . . . . . . . . . . . . . 2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . 2.4 Basic properties . . . . . . . . . . . . . . . . . . . . 2.5 Closure properties . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
4 Regular grammars for LQR languages (QRG) 52 4.1 The definition of QRG . . . . . . . . . . . . . . . . . . . . . . 52 4.2 Equivalence of the languages defined by QRG and LFMA . . . 53 5 Summary
59
Bibliography
60
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
List of Figures 1.1 2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 3.3 4.1 4.2
An FMA that accepts L = {σσ : |σ| ≥ 1, σ ∈ Σ \ [σ]} . . . . . An LFMA that accepts L = {σσ : σ ∈ Σ \ [σ]} . . . . . . . . An LFMA that accepts L = {σ 1 σσ 2 σσ 3 : σ ∈ Σ} . . . . . . . The FMA to LFMA translation scheme . . . . . . . . . . . . . An LFMA that accepts Ls = {uσv : |u|, |v| ≥ 1, σ 6∈ [u] ∪ [v]} ∗ A 2WFMA that accepts Ls . . . . . . . . . . . . . . . . . . . The diagram of the Atx,y,z component of A . . . . . . . . . . . The scope of the ith symbol in w . . . . . . . . . . . . . . . . Scopes Six and Siy0 in v = v 1 · v 2 . . . . . . . . . . . . . . . . . Scopes Six and Siy0 in v = v 1 · ·z · v 2 . . . . . . . . . . . . . . . The G to A translation scheme . . . . . . . . . . . . . . . . . The A to G translation scheme . . . . . . . . . . . . . . . . .
8 14 15 20 33 34 35 37 48 49 54 56
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Abstract A new model of finite-state automata, dealing with infinite alphabets, called finite-state datalog automata (FSDA) was introduced by Shemesh and Francez in 1994. FSDA were intended for the abstract study of relational languages. Since the character of relational languages requires the use of infinite alphabets of names of variables, in addition to a finite set of states, FSDA are equipped with a finite set of “registers” capable of retaining a variable name (out of an infinite set of names). The equality test between the input symbol and the edge label, which is performed in ordinary finite-state automata (FA) was replaced with unification, which is a crucial element of relational languages. Later, FSDA were extended by Kaminski and Francez to a more general model dealing with infinite alphabets, called finite-memory automata (FMA). FMA were designed to accept the infinite alphabet counterpart of the ordinary regular languages. Similarly to FSDA, FMA are equipped with a finite set of registers, which are either empty or contain a symbol from the infinite alphabet, but contrary to FSDA, registers in FMA cannot contain values currently stored in other registers. By restricting the power of the automaton to copying a symbol to a register and comparing the content of a register with an input symbol only, without the ability to perform any functions, the automaton is only able to “remember” a finite set of input symbols. Thus, the languages accepted by FMA posses many of the properties of regular languages, but they are not closed under reversing. In 1998 Cheng and Kaminski extended FMA to pushdown automata over infinite alphabets. Their model of computation is called infinite-alphabet infinite-stack-alphabet pushdown automata (IIPDA). It allows both infinite input and stack alphabets. IIPDA aims towards recognizing only the “natural” analog of the ordinary context-free languages. Similarly to FMA, IIPDA are equipped with a finite set of registers, but unlike FMA, IIPDA can nondeterministically change symbols stored in their registers. Recently Neven, Schwentick, and Vianu introduced pebble automata (PA) over infinite alphabets. It allows to remember a finite set of positions based on pebbles obeying the stack discipline and allows equality tests between the symbols on these positions. They have shown the incomparability between FMA and logics FO∗ and MSO∗ , while PA lie between FO∗ and MSO∗ . In our thesis, we present an extension of FMA, called look-ahead finitememory automaton (LFMA). This model is similar in many ways to FMA, 1
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
but is able to make a look-ahead reassignment by “guessing” the content of some register. We show that FMA can be simulated by LFMA and that LFMA languages are closed under reversing (while FMA are not closed under reversing). Also we introduce a notion of a regular expression over an infinite alphabet and show that a language is definable by an infinite alphabet regular expression if and only if it is acceptable by an LFMA. Finally, we identify a class of context-free grammars (left- and right-linear) over infinite alphabets which generate LFMA languages.
2
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Notation and Abbreviations Throughout this thesis we shall use the following conventions. • Σ = {σ1 , σ2 , . . .} is an infinite input alphabet. • Words are always denoted by boldface letters (possibly indexed or primed). • Bold low-case Greek letters α, σ, and τ denote words over Σ. • Words of fixed length over Σ are denoted by u, v, w and x. • Elements of Σ are denoted by σ, τ , u, v, w, x, and y, usually indexed. • Symbols that appear in a word denoted by a boldface letter are always denoted by the same non-boldface letter with some subscript. That is, symbols that appear in σ are denoted by σi , symbols that appear in w are denoted by wi , and symbols that appear in a word X are denoted by Xi , etc. • The symbol # does not belong to Σ. This symbol is reserved to denote an empty register. • |σ| = |σ1 σ2 · · · σn | = n is the length of word σ. ∗
• [w] = {wi 6= # : i = 1, 2, . . . , n} is the content of w ∈ (Σ ∪ {#}) . It consists of all symbols from Σ that appear in the word. • FA is the “classic” model of finite state automata, defined in [4, 5]. • FMA is an abbreviation for Finite-Memory Automaton, defined in [7]. (In our thesis we extend FMA with ability to “guess” an assignment.) • LFMA is an abbreviation for Look-ahead Finite-Memory Automaton. (This is the main subject of this thesis, which is an extension of FMA.)
3
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
1
Introduction
A new model of finite-state automata, dealing with infinite alphabets, called finite-state datalog automata (FSDA) was introduced in [12]. FSDA were intended for the abstract study of relational languages. Words in such a language are formed by composing base relations, and have the general form ri1 (xj1 , xk1 ) · · · rin (xjn , xkn ) for some n. Since the character of relational languages requires the use of infinite alphabets of names of variables, in addition to a finite set of states, FSDA are equipped with a finite set of “registers” capable of retaining a variable name (out of an infinite set of names). The equality test between the input symbol and the edge label, which is performed in ordinary finite-state automata (FA) was replaced with unification, which is a crucial element of relational languages. Later, FSDA were extended in [6] to a more general model dealing with infinite alphabets, called finite-memory automata (FMA). FMA were designed to accept the infinite alphabet counterpart of the ordinary regular languages. Similarly to FSDA, FMA are equipped with a finite set of registers, which are either empty or contain a symbol from the infinite alphabet, but contrary to FSDA, registers in FMA cannot contain values currently stored in other registers. By restricting the power of the automaton to copying a symbol to a register and comparing the content of a register with an input symbol only, without the ability to perform any functions, the automaton is only able to “remember” a finite set of input symbols. Thus, the languages accepted by FMA posses many of the properties of regular languages, but they are not closed under reversing. Also, whereas decision of the emptiness and containment for FMA- (and, consequently for FSDA-) languages is relatively simple, the problem of inclusion for FMA-languages is undecidable, see [11]. An extension of FSDA to a general infinite alphabet called finite-state unification based automata (FSUBA) was proposed in [13]. These automata are similar in many ways to FMA, but are a bit weaker, because a register of FSUBA may contain values currently stored in other registers. It was shown in [13] that FSDA can be simulated by FSUBA and that the problem of inclusion for FSUBA languages is decidable. While the study of finite automata over infinite alphabets started as purely theoretical, since the appearance of [6] and [7] it seems to have turned to more practically oriented. The key idea for the applicability is finding practical interpretations to the infinite alphabet and to the languages over it. 4
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
• In [10, 11], members of the (infinite) alphabet Σ are interpreted as records of communication actions, “send” and “receive” of messages during inter-process-communication, words in language L over this alphabet are MSCs, message sequence charts, capturing behaviors of the communication network. • In [1], members of Σ are interpreted as URLs’ addresses of internet sites, a word in L is interpreted as “navigation path” in the internet, the result of some finite sequence of clicks. • In [2] there is another internet-oriented interpretation of Σ, namely, XML mark-ups of pages in a site. In our thesis, we present an extension of FMA, called look-ahead finitememory automaton (LFMA). This model is similar in many ways to FMA, but is able to make a look-ahead reassignment by “guessing” the content of some register. We show that FMA can be simulated by LFMA and that LFMA languages are closed under reversing (whereas FMA are not closed under reversing). Also we introduce a notion of a regular expression over an infinite alphabet and show that a language is definable by an infinite alphabet regular expression if and only if it is acceptable by an LFMA. Finally, we identify a class of context-free grammars (left- and right-linear) over infinite alphabets which generate LFMA languages. The thesis is organized as follows. Below we recall the definition of FMA from [7] and some other models of computation over infinite alphabets. In the next section we define LFMA - the main subject of our thesis - and prove its closure properties. In Section 3 we present regular expressions over infinite alphabets for LFMA languages and prove their equivalence to LFMA. In Section 4 we present regular grammars over infinite alphabets and prove their equivalence to LFMA. We end the thesis with a short summary of obtained results and indicate a direction of a possible further research.
1.1
Basic definitions
Let Σ be an infinite alphabet and let # be a symbol1 not belonging to Σ. ∗ An assignment is a word w1 w2 · · · wr ∈ (Σ ∪ {#}) such that if wi 6= #, then for any j 6= i, wi 6= wj . That is, an assignment is a word over Σ ∪ {#}, 1
This symbol is reserved to denote an empty register.
5
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
where each symbol from Σ appears at most one time. We denote the set of r6= all assignments of length r by Σ . ∗ For a word w = w1 w2 · · · wn ∈ (Σ ∪ {#}) , we define the content of w, denoted [w], by [w] = {wi 6= # : i = 1, 2, . . . , n}. That is, [w] consists of all symbols from Σ which appear in the word w. Throughout this paper we shall use the following conventions. • Words are always denoted by boldface letters (possibly indexed or primed). • Bold low-case Greek letters α, σ, and τ denote words over Σ. • Assignments and words of fixed length over Σ are denoted by u, v, w and x. • Elements of Σ are denoted by σ, τ , u, v, w, x, and y, usually indexed. • Symbols which appear in a word denoted by a boldface letter are always denoted by the same non-boldface letter with some subscript. That is, symbols which appear in σ are denoted by σi , symbols which appear in w are denoted by wi , and symbols which appear in a word X are denoted by Xi , etc.
1.2
Previous works
In this section we recall the definitions of finite-memory automata, two-way finite-memory automata, and context-free grammar over an infinite alphabet. 1.2.1
Finite-Memory Automaton (FMA)
Definition 1 [7, Definition 1] A finite-memory automaton (FMA) is a system A = (Q, s0 , u, ρ, µ, F ), where • Q is a finite set of states. • s0 ∈ Q is the initial state. • u = u1 u2 · · · ur ∈ Σ
r6=
is the initial assignment to the r registers of A.
6
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
• ρ : Q → {1, 2, . . . , r} is a partial function from Q to {1, 2, . . . , r} called the reassignment. Intuitively, if A is in state q, ρ(q) is defined, and the input symbol appears in no register, then A “forgets” the contents of the ρ(q)th register and copies the input symbol into that register. • µ ⊆ Q × {1, 2, . . . , r} × Q is the transition relation. Intuitively, if A is in state p, the input symbol is equal to the content of the k th register, and (p, k, q) ∈ µ, then A may enter state q and pass to the next input symbol. In addition, if the input symbol appears in no register and is placed into the k th register (k = ρ(p)), then in order to enter state q the transition relation must contain (p, k, q). That is, the reassignment is made prior to a transition. • F ⊆ Q is the set of final states. Similar to the case of finite automata, A can be represented by its initial assignment and a finite directed graph whose nodes are states. There is an edge from p to q, if for some k = 1, 2, . . . , r, (p, k, q) ∈ µ. Such an edge is labeled k. Also, if for a node q the value of ρ is defined, then q is labeled ρ(q) and if q ∈ F , it is labeled as such. For graph representation of finite-memory automata, see Example 1 on p. 8. r6= ∗ An instantaneous description of A is a member of Q × Σ × Σ . The first component of an instantaneous description is the (current) state of the automaton, the second one is the assignment consisting of the contents of the registers (in the increasing order of their indices), and the third component is the portion of the input yet to be read. Next, we define the relation ` (yielding in one step) between two instantaneous descriptions (p, v1 v2 · · · vr , σσ) and (q, w1 w2 · · · wr , σ), σ ∈ Σ. We write (p, v1 v2 · · · vr , σσ) ` (q, w1 w2 · · · wr , σ), | {z } | {z } v w if the following holds. • If σ = vk ∈ [v], then w = v and (p, k, q) ∈ µ. • If σ 6∈ [v], then ρ(p) is defined, wρ(p) = σ, wl = vl , for each l 6= ρ(p), and (p, ρ(p), q) ∈ µ. ∗
We denote the reflexive and transitive closure of ` by ` and say that A r6= ∗ ∗ accepts a word σ ∈ Σ , if (s0 , u, σ) ` (f, v, ε) for some f ∈ F and v ∈ Σ . 7
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
The set of all words accepted by A is denoted by L(A) and is referred to as a quasi-regular language. The set of all quasi-regular sets is denoted by LQR . Example 1 Consider a 2-register finite-memory automaton A = ({s, q, f }, s, ##, ρ, µ, {f }), where • ρ(s) = 1, ρ(q) = 2, and • µ = {(s, 1, q), (q, 2, q), (q, 2, f )}. Alternatively, A can be described by the following diagram. 2
s, 1
1
q, 2
2
f
# #
u:
Figure 1.1: An FMA that accepts L = {σσ : |σ| ≥ 1, σ ∈ Σ \ [σ]} Obviously, A accepts all words whose first symbol differs from all the others. The following property of quasi-regular languages will be used in the sequel. It reflects the fact that a finite-memory automaton can only sense “new” input symbols, i.e., ones appearing in the contents of the most recent assignment. It cannot, however, distinguish between different “new” symbols. The above property of finite-memory automata is a useful tool for proving that a language is not quasi-regular. Note that an occurrence of an input symbol that belonged to a previous assignment and was “forgotten” in a reassignment is also “new”. Proposition 1 [7, Proposition 3] (Indistinguishability property of FMA). Let A = (Q, s0 , u, ρ, µ, F ) be an r-register FMA. If xy ∈ L(A), then there exists a subset Σ0 ⊆ [x] such that the cardinality of Σ0 does not exceed r and the following holds. For any σ 6∈ Σ0 and any τ 6∈ [y] ∪ Σ0 , the word x(y(σ|τ )) obtained from xy by the substitution of τ for each occurrence of σ in y belongs to L(A). 8
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
1.2.2
Two-Way Finite-Memory Automaton (2WFMA)
Definition 2 [7, Definition 5] A two-way non-deterministic finite-memory automaton (2WFMA) is a system A = (Q, s0 , u, ρ, µ, F ), where Q, s0 , u, ρ, F are as in finite-memory automaton, and the transition µ is a subset of Q × {1, 2, . . . , r} × Q × {−1, 1}. The meaning of µ is as follows. If µ(s, k) = (t, −1) or µ(s, k) = (t, 1), then in state s, scanning the input symbol stored in the k th register, the automaton enters state t and moves left or right, respectively. The first and the second components of µ(s, k) are denoted by µ1 (s, k) and µ2 (s, k), respectively. That is, µ1 : S × {1, 2, . . . , r} → S, µ2 : S × {1, 2, . . . , r} → {−1, 1}, and µ = (µ1 , µ2 ). A configuration of A is a pair (s, w), where s ∈ S, and w is an assignment of length r. The transition function µ induces the function µc : S c × Σ → S c × {−1, 1} that is defined as follows. Let c = (s, w). • If σ = wk ∈ [w], then µc (c, σ) = ((µ1 (s, k), w), µ2 (s, k)). • If σ 6∈ [w], then µc (c, σ) = ((µ1 (s, ρ(s)), v), µ2 (s, ρ(s))), where v results from w by replacing the ρ(s)th symbol of w with σ. The first and the second components of µc (s, k) are denoted by µc1 (s, k) and µc2 (s, k), respectively. As in the case of a two-way deterministic finite automaton, the future behavior of a two-way deterministic finite-memory automaton on a given input depends only on the automaton configuration and the head position. The formal definition of this combination is as follows. An instantaneous description of A on the input word σ is a pair (c, i), where c ∈ S c and i is a positive integer not exceeding |σ| + 1. The instantaneous description (c, i) is intended to represent the facts that c is the current automaton configuration during the computation on the input σ, and the automaton head is scanning the ith symbol of σ, if i ≤ |σ|, and the head has fallen off the right end of σ, if i = |σ| + 1. Next we define the successor relation `A,σ on the set of instantaneous descriptions of A on σ. Let σ = σ1 σ2 · · · σn . If i = 1 and µc2 (c) = −1, or i = n + 1, then (c, i) has no successor. Otherwise (c, i) `A,σ (c0 , i0 ) if and only if c0 = µc1 (c, σi ) and i0 = i + µc2 (c, σi ). The requirement i > 1 for µc2 (c, σi ) = −1 prevents any action in the event that the automaton head would move off the left end of the input, and the requirement i ≤ n prevents 9
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
any action in the event that the automaton head would move off the right end of the input. ∗ Let `A,σ be the transitive and reflexive closure of `A,σ . We say that A ∗ ∗ accepts σ ∈ Σ , if for some final configuration f c ∈ F c , (sc0 , 1) `A,σ (f c , |σ| + 1). That is, σ is accepted by A if, starting in the state s0 and the head on the first symbol of σ, A eventually enters a final state at the same time it falls off the right end of the input. As usual, the set of all words accepted by A is denoted by L(A). As we are often dealing with two-way automata we delimit input strings by two special symbols, ¤, ¢ 6∈ Σ for the left and the right ends of the string, neither of which is in Σ. Hence, automata always work on extended strings ∗ of the form σ 0 = ¤σ¢, where σ ∈ Σ . The positions of ¤ and ¢ are 0 and |σ| + 1, respectively. 1.2.3
Context-free grammars over infinite alphabets
Definition 3 [3, Definition 1] An infinite alphabet context-free grammar is a system G = (V, u, R, S), where • V is a finite set of variables, V ∩ Σ = ∅. • u = u1 u2 · · · ur ∈ Σ
r6=
is the initial assignment. ∗
of productions. For • R ⊆ V × {1, 2, . . . , r} × (V ∪ {1, 2, . . . , r}) is a set ∗ A ∈ V, i = 1, 2, . . . , r, and a ∈ (V ∪ {1, 2, . . . , r}) , we write the triple (A, i, a) as (A, i) → a. • S ∈ V is the start symbol. r6=
For A ∈ V , w = w1 w2 · · · wr ∈ Σ , and X = X1 X2 · · · Xn ∈ (Σ ∪ (V × ∗ Σ )) , we write (A, w) ⇒ X if there exist a production (A, i) → a ∈ R, ∗ a = a1 a2 · · · an ∈ (V ∪ {1, 2, . . . , r}) and σ 6∈ [w] \ {wi } such that the condition below is satisfied. r6= Let w0 ∈ Σ be obtained from w by replacing wi with σ. Then, for j = 1, 2, . . . , n the following holds. r6=
• If aj = k for some k = 1, 2, . . . r, then Xj = wk0 . • If aj = B for some B ∈ V , then Xj = (B, w0 ).
10
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
r6=
For two words X and Y over Σ ∪ (V × Σ ), we write X ⇒ Y if there r6= r6= exist words X 1 , X 2 , and X 3 over Σ ∪ (V × Σ ), and (A, w) ∈ V × Σ such that X = X 1 (A, w)X 2 , Y = X 1 X 3 X 2 , and (A, w) ⇒ X 3 . As usual, the ∗ reflexive and transitive closure of ⇒ is denoted by ⇒ . The language L(G) ∗ ∗ generated by G is defined by L(G) = {σ ∈ Σ : (S, u) ⇒ σ} and is referred to as a quasi-context-free language. Example 2 [3, Example 1] Let G = ({S}, u, R, S) be an infinite-alphabet 1 context-free grammar, where u = u1 ∈ Σ (that is, u is a word of length one), and R consists of two productions (S, 1) → 1S1 | ε. One can easily ∗ verify that L(G) = {σσ R : σ ∈ Σ }, where σ R denotes the reversal of σ. For example, the word σ1 σ2 σ3 σ3 σ2 σ1 is derived as follows. (S, u) ⇒ σ1 (S, σ1 )σ1 ⇒ σ1 σ2 (S, σ2 )σ2 σ1 ⇒ σ1 σ2 σ3 (S, σ3 )σ3 σ2 σ1 ⇒ σ1 σ2 σ3 σ3 σ2 σ1 .
11
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
2
Look-ahead Finite-Memory Automaton
In this section we introduce look-ahead finite-memory automaton (LFMA) over an infinite alphabet. This is an extended version of FMA that is the main subject of our research.
2.1
Guessing instead of reassignment
The model we present here is similar in many ways to FMA, but is able to make a look-ahead reassignment by “guessing” the content of some register. The intuition lying behind the above definition is a property of closure under reversing. The closure under reversing is crucial, since it allows us to introduce regular expressions and regular grammars over infinite alphabets.
2.2
The definition of LFMA
Definition 4 A look-ahead finite-memory automaton (LFMA) is a system A = (Q, s0 , u, µ, ρ, F ), where • Q is a finite set of states. • s0 ∈ Q is the initial state. • u = u1 u2 · · · ur ∈ Σ
r6=
is the initial assignment to the r registers of A.
• µ ⊆ Q × ({1, 2, . . . , r} ∪ {ε}) × Q is the transition relation. Intuitively, if A is in state p, the input symbol is equal to the content of the k th register, and (p, k, q) ∈ µ, then A may enter state q and pass to the next input symbol. Similarly, if (p, ε, q) ∈ µ, then A may make a lookahead reassignment using the ρ function, i.e., replace the content of the ρ(p, q) register with any element of Σ that does not appear in any other register, and enter state q without reading the next input symbol. • ρ : {(p, q) : (p, ε, q) ∈ µ} → ({1, 2, . . . , r} ∪ {nil}) is a function called the look-ahead reassignment. Intuitively, if A is in state p, (p, ε, q) ∈ µ and ρ(p, q) = k, then for k 6= nil, A can non-deterministically replace the content of the k th register with an element of Σ not appearing in any other register and enter state q. Note that, unlike in Definition 1, we define ρ on edges and allow A to “guess” the replacement.
12
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
• F ⊆ Q is the set of final states. r6=
∗
An instantaneous description of A is a member of Q × Σ × Σ . The first component of an instantaneous description is the (current) state of the automaton, the second one is the assignment consisting of the contents of the registers (in the increasing order of their indices), and the third component is the portion of the input yet to be read. Next, we define the relation ` (yielding in one step) between two instantaneous descriptions (p, v1 v2 · · · vr , σσ) and (q, w1 w2 · · · wr , σ), σ ∈ Σ ∪ {ε}. We write (p, v1 v2 · · · vr , σσ) ` (q, w1 w2 · · · wr , σ), | {z } | {z } v w if the following holds. • If σ = ε and ρ(p, q) = k, then for k 6= nil, wk ∈ Σ \ {v1 , . . . , vk−1 , vk+1 , . . . , vr } and for each l 6= k, wl = vl . Otherwise, i.e., σ 6= ε or k = nil, w = v. • If σ 6= ε, then for some k, σ = vk , and (p, k, q) ∈ µ. ∗
We denote the reflexive and transitive closure of ` by ` and say that A r6= ∗ ∗ accepts a word σ ∈ Σ , if (s0 , u, σ) ` (f, v, ε) for some f ∈ F and v ∈ Σ . The set of all words accepted by A is denoted by L(A) and is referred to as a look-ahead quasi-regular language. The set of all look-ahead quasi-regular sets is denoted by LLQR .
2.3
Examples
Before considering some general properties of LFMA, we present several examples which show the difference and similarity between finite automata over finite and infinite alphabets. Example 3 Consider a 2-register look-ahead finite-memory automaton A = ({s, q, f }, s, ##, µ, ρ, {f }), where • µ = {(s, ε, q), (q, ε, q), (q, 2, q), (q, 1, f )}, • ρ(s, q) = 1, and ρ(q, q) = 2.
13
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
ε:2
s
ε:1
q
1
f
2 u:
# #
Figure 2.1: An LFMA that accepts L = {σσ : σ ∈ Σ \ [σ]} Alternatively, A can be described by the diagram above. It can be easily seen that L(A) consists exactly of all words whose last symbol differs from all the others. That is, L(A) = {σ1 σ2 · · · σn : σi 6= σn , 1 ≤ i < n}.
Example 4 Consider a 2-register look-ahead finite-memory automaton A = ({s, q0 , q1 , f }, s, ##, µ, ρ, {f }), such that • µ = µ1 ∪ µ2 , where − µ1 = {(s, ε, q0 ), (q0 , ε, q0 ), (q1 , ε, q1 ), (f, ε, f )} and − µ2 = {(q0 , 1, q0 ), (q0 , 2, q1 ), (q1 , 1, q1 ), (q1 , 2, f ), (f, 1, f ), (f, 2, f )}. • ρ(q0 , q0 ) = ρ(q1 , q1 ) = ρ(f, f ) = 1 and ρ(s, q0 ) = 2. Alternatively, A can be described by the diagram in Figure 2.2. One can easily verify that the language L = L(A) consists exactly of those words where some element of Σ appears twice or more. That is, L = {σ1 σ2 · · · σn : there exist 1 ≤ i < j ≤ n such that σi = σj }. The behavior of A on such words is as follows. Being the initial state, the automaton guesses the symbol which appears twice, stores it in the second register, and changes the state to q0 . Being the q0 state, automaton guesses 14
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
ε:1
s
ε:2
ε:1
2
q0
1
1
u:
q1
ε:1
2
f
1, 2
# #
Figure 2.2: An LFMA that accepts L = {σ 1 σσ 2 σσ 3 : σ ∈ Σ} next input symbol, stores it in the first register, and reads it. When reading a symbol that appears twice (symbol stored in the second register), A changes the state to q1 . Being the q1 state, the automaton guesses next input symbol, stores it in the first register and reads it, or if the input is the symbol stored in the second register, A changes the state to the final state f . For example, σ = abcbd ∈ L, because b appears twice in σ. An accepting run of A on σ is (s, ##, abcbd) ` (q0 , #b, abcbd) ` (q0 , ab, abcbd) ` (q0 , ab, bcbd) ` (q1 , ab, cbd) ` (q1 , cb, cbd) ` (q1 , cb, bd) ` (f, cb, d) ` (f, db, d) ` (f, db, ε). ∗
We shall see in the sequel that the complement of L (with respect to Σ ) is not look-ahead quasi-regular.
2.4
Basic properties
In this section we show some basic properties of look-ahead quasi-regular languages. Namely, we prove the “normalization” lemma for look-ahead finitememory automaton, show that FMA can be simulated by LFMA, and establish some decidability results. Lemma 1 Let A = (Q, s0 , u, µ, ρ, F ) be an r-register LFMA. Then L(A) is rc 6= for accepted by an LFMA A0 = (Q0 , s00 , u0 , µ0 , ρ0 , F 0 ) such that u0 ∈ #rl · Σ
15
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
some rl , rc , and ρ is a function into ({1, 2, . . . , rl } ∪ {nil}).2 Proof: Let A = (Q, s0 , u, µ, ρ, F ) be an r-register LFMA. Informally, we construct a 2r-register LFMA A0 with u0 = #r · u as initial assignment that simulates A and makes no look-ahead reassignments for the last r registers. An assignment w = w1 w2 · · · wr of A is represented by an assignment w0 = 0 w10 w20 · · · w2r of A0 such that [w] ⊆ [w0 ], and an injective “pointer” function f : {1, 2, . . . , r} → {1, 2, . . . , 2r} such that (p, k 0 , q) is a transition of A0 if and only if (p, k, q) ∈ µ and f (k) = k 0 .3 The intuition lying behind the above construction is the ability to present w as a subset of w0 at any stage of the computation of A, i.e, wi = w0f (i) , i = 1, 2, . . . , r. A formal description of A0 is as follows. Let r0 = 2r, Rl = {1, 2, . . . , r}, and Rc = {r + 1, r + 2, . . . , r0 } be the sets of the first (“look-ahead”) and the last (“constant”) r indices, respectively, and let R = Rl ∪ Rc . Let Fr0 denote the set of all (“pointer”) injective functions f : Rl → R. In other words, no two registers from the look-ahead registers’ tuple are pointed to the same register. For two functions f, f 0 ∈ Fr0 and k ∈ Rl , we write f ∼ =k f 0 if and only if f (i) = f 0 (i), for all i ∈ Rl \ {k}. Finally, for the register index i ∈ R and a set of registers T ⊆ Rl , we say i is a T -free register (with respect to f ), if i 6∈ Im(f |T ), i.e., no register from T is pointed to i. Now, A0 = (Q0 , s00 , u0 , µ0 , ρ0 , F 0 ) is an r0 -register LFMA, such that • Q0 = Q × F r 0 . • s00 = (s0 , f0 ), where f0 (i) = r + i, i = 1, 2, . . . , r. • u0 = #r · u. • µ0 = µ01 ∪ µ02 , where – µ01 consists of all triples ((p, f ), k 0 , (q, f )) such that for some (p, k, q) ∈ µ, k 6= ε and f (k) = k 0 . – µ02 consists of all triples ((p, f ), ε, (q, f 0 )) such that (p, ε, q) ∈ µ and f 0 is as in the definition of ρ below. • ρ0 ((p, f ), (q, f 0 )) = k 0 if and only if one of the following conditions is satisfied. Let ρ(p, q) = k. 2
That is, A’s registers can be partitioned into “look-ahead” registers (first rl registers) and “constant” registers (last rc registers). 3 If k is ε, we assume that k 0 = ε.
16
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
1. If k = nil, then f 0 = f and k 0 = nil. Otherwise assume k 6= nil below. 2. If j ∈ R is an (Rl \ {k})-free register, then f 0 ∼ =k f , f 0 (k) = j, and k 0 = nil. 3. If k 0 ∈ Rl is an (Rl \{k})-free register, then f 0 ∼ =k f and f 0 (k) = k 0 . • F 0 = F × Fr0 . We contend that L(A) = L(A0 ). Let σ = σ1 σ2 · · · σn ∈ L(A) and let R = (q0 , w0 , σ 0 ), (q1 , w1 , σ 1 ), . . . , (qk , wk , ε) be an accepting run of A on σ. That is q0 = s0 , w0 = u, σ 0 = σ and qk ∈ F . We transform it into an accepting run R0 = (q00 , w00 , σ 0 ), (q10 , w01 , σ 1 ), . . . , (qk0 , w0k , ε), qi0 = (qi , fi ), i = 0, 1, . . . , k, of A0 on σ by induction as follows. Let q00 = s00 , w00 = u0 and assume that fi has been defined such that the condition below is satisfied. 0 For all k ∈ Rl : wi,k = wi,f . i (k)
(1)
Note that, [wi ] ⊆ [w0i ] follows from above, and, by the definition of q00 and w00 , this condition is satisfied for i = 0. Let i ≥ 0 and let (qi , wi , σσ i+1 ) ` (qi+1 , wi+1 , σ i+1 ). We shall distinguish between the cases of σ 6= ε and σ = ε. Let σ 6= ε and assume that for some k, σ = wi,k , (qi , k, qi+1 ) ∈ µ, and wi+1 = wi . By the induction hypothesis, condition (1) is satisfied, implying 0 σ = wi,f and, by the definition of µ0 , ((qi , fi ), fi (k), (qi+1 , fi )) ∈ µ0 . Thus, i (k) 0 (qi0 , w0i , σσ i+1 ) `σ (qi+1 , w0i+1 , σ i+1 ) 0 with qi+1 = (qi+1 , fi ) and w0i+1 = w0i , i.e., the induction hypothesis holds for i + 1. Let σ = ε, (qi , ε, qi+1 ) ∈ µ, ρ(qi , qi+1 ) = k, wi+1,k = δ ∈ Σ\{wi,1 , . . . , wi,k−1 , wi,k+1 , . . . , wi,r }, and for each l 6= k, wi+1,l = wi,l . 0 Assume δ ∈ [w0i ] and let δ = wi,j for some j ∈ R. Note that, since δ 6∈ ([wi ] \ {wi,k }), j is an (Rl \ {k})-free register, (otherwise, we would have a contradiction with δ 6∈ ([wi ] \ {wi,k })). Let fi+1 ∼ =k fi and fi+1 (k) = j. By 0 0 0 the definition of µ and ρ , ρ ((qi , fi ), (qi+1 , fi+1 )) = nil. Thus, 0 (qi0 , w0i , σ i+1 ) `ε (qi+1 , w0i+1 , σ i+1 ) 0 = (qi+1 , fi+1 ) and w0i+1 = w0i , i.e., the induction hypothesis holds with qi+1 for i + 1.
17
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Now, assume δ 6∈ [w0i ] and let k 0 ∈ Rl be an (Rl \ {k})-free register. Note, that such a register always exists, because Im(f |Rl \{k} ) ( Rl . Let fi+1 ∼ = k fi and fi+1 (k) = k 0 . By the definition of µ0 and ρ0 , ρ0 ((qi , fi ), (qi+1 , fi+1 )) = k 0 . Thus, 0 (qi0 , w0i , σ i+1 ) `ε (qi+1 , w0i+1 , σ i+1 ) 0 0 0 0 0 0 with qi+1 = (qi+1 , fi+1 ) and w0i+1 = wi,1 wi,2 · · · wi,k 0 −1 δwi,k 0 +1 · · · wi,r 0 , i.e., the induction hypothesis holds for i + 1, which completes the proof of inclusion L(A) ⊆ L(A0 ). Conversely, let σ ∈ L(A0 ) and let R0 = (q00 , w00 , σ 0 ), (q10 , w01 , σ 1 ), . . . , (qk0 , w0k , ε), where qi0 = (qi , fi ), i = 0, 1, . . . , k, be an accepting run of A0 on σ. That is q00 = s00 , w00 = u0 , σ 0 = σ and qk0 ∈ F 0 . We transform it into an accepting run R = (q0 , w0 , σ 0 ), (q1 , w1 , σ 1 ), . . . , (qk , wk , ε) of A on σ by induction as follows. Assume that fi has been defined such that condition (1) above is satisfied. Note that, by the definition of q00 and w00 , this condition is 0 satisfied for i = 0. Let i ≥ 0 and let (qi0 , w0i , σσ i+1 ) ` (qi+1 , w0i+1 , σ i+1 ). As above, we shall distinguish between the cases of σ 6= ε and σ = ε. 0 0 0 0 0 0 0 Let σ 6= ε. Then, for some k 0 , σ = wi,k 0 and (qi , k , qi+1 ) ∈ µ , w i+1 = w i . By the definition of µ01 , for some k 6= ε, (p, k, q) ∈ µ and f (k) = k 0 . By the induction hypothesis, condition (1) is satisfied, implying σ = wi,k . Thus,
(qi , wi , σσ i+1 ) `σ (qi+1 , wi+1 , σ i+1 ) with wi+1 = wi , i.e., the induction hypothesis holds for i + 1. 0 0 Let σ = ε, (qi0 , ε, qi+1 ) ∈ µ0 , ρ0 (qi0 , qi+1 ) = k 0 , and for some k, ρ(p, q) = k. Assume that we are in the first case of the definition of ρ0 , i.e., k 0 = k = nil. Since A0 can make an ε-move if and only if A can make an ε-move the induction hypothesis holds for i + 1. Now, assume that we are in the second case of the definition of ρ0 , i.e., 0 k = nil, fi+1 ∼ =k fi and fi+1 (k) = j for some (Rl \ {k})-free register j ∈ R. 0 Let δ = wi,j , then, by the induction hypothesis and definition of j, δ 6∈ ([wi ] \ {wi,k }). Thus, (qi , wi , σ i+1 ) `ε (qi+1 , wi+1 , σ i+1 ) with wi+1 = wi,1 wi,2 · · · wi,k−1 δwi,k+1 · · · wi,r , i.e., the induction hypothesis holds for i + 1. Finally, assume that we are in the third case of the definition of ρ, i.e., 0 k ∈ Rl is an (Rl \ {k})-free register, fi+1 ∼ =k fi and fi+1 (k) = k 0 . Then, 18
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
0 0 0 0 0 0 wi+1,k 0 = δ ∈ Σ \ {wi,1 , . . . , wi,k 0 −1 , wi,k 0 +1 , . . . , wi,r 0 } and for each l 6= k , 0 0 wi+1,l = wi,l . Then, by the induction hypothesis and definition of k 0 , δ 6∈ [wi ]. Thus, (qi , wi , σ i+1 ) `ε (qi+1 , wi+1 , σ i+1 )
with wi+1 = wi,1 wi,2 · · · wi,k−1 δwi,k+1 · · · wi,r , i.e., the induction hypothesis holds for i + 1. This proves the inclusion L(A0 ) ⊆ L(A) and completes the proof of the lemma. 2 Theorem 1 Each quasi-regular language is also a look-ahead quasi-regular. In other words LQR ⊆ LLQR . Proof: We show, how FMA can be simulated by LFMA. Let A = ˜ = {˜ (Q, s0 , u, µ, ρ, F ) be an r-register FMA. Let Q q : q ∈ Q} be a set of states disjoint to Q. We construct an r-register LFMA AL = (Q ∪ ˜ s0 , u, µL , ρL , F ), such that Q, • µL = µ1 ∪ µ2 ∪ µ3 , where − µ1 = {(˜ p, k, q1 ) : (p, k, q1 ) ∈ µ and ρ(p) = k}, − µ2 = {(p, k, q2 ) : (p, k, q2 ) ∈ µ and ρ(p) 6= k}, and − µ3 = {(p, ε, p˜) : p ∈ Q}. • ρL (p, p˜) = k if and only if ρ(p) = k. In other words, we use a translation scheme from FMA A to LFMA AL as described by Figure 2.3. A straightforward induction on the length of the accepting run shows that R = (q0 , w0 , σ 0 ), (q1 , w1 , σ 1 ), . . . , (qn , wn , ε) is an accepting run of A on σ 0 = σ1 σ2 · · · σn if and only if ˜ 0 , σ 0 ), (q1 , w1 , σ 1 ), (˜ ˜ 1 , σ 1 ), . . . , (qn , wn , ε) RL = (q0 , w0 , σ 0 ), (˜ q0 , w q1 , w is an accepting run of AL on σ 0 , where RL contains instantaneous description ˜ i , σi+1 σi+2 · · · σn ), i = 1, 2, . . . , n − 1, if and only if σi+1 6∈ wi . Therefore (˜ qi , w L(A) = L(AL ). 2 Corollary 1 The containment and emptiness problems for look-ahead quasiregular languages are decidable. 19
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
q1
q1 k
p, k
k ε:k
p
⇒
l(6= k)
p˜
l
q2
q2
Figure 2.3: The FMA to LFMA translation scheme Proof: The proofs follow from the inclusion LLQR ⊂ LCF G , see Section 4, and the fact that the containment and emptiness problems for context-free grammars are decidable, see [3, Propositions 4 and 5]. 2 Corollary 2 The inclusion problem of look-ahead quasi-regular languages is undecidable. Proof: The proof follows from Theorem 1 and the fact that the inclusion problem of quasi-regular languages is undecidable, see [11, Section 5]. 2
2.5
Closure properties
In this section we consider closure properties of look-ahead quasi-regular languages. We prove that look-ahead quasi-regular languages are closed under intersection, union, concatenation, the Kleene star, and reversing. The proofs are based on the standard construction adapted to a slightly modified version of look-ahead finite-memory automata that allows the possibility of relating the input symbol to two registers simultaneously. Definition 5 A look-ahead finite-memory automaton with double assignment, or shortly LFMAD , is a system A = (Q, s0 , ua , ub , µ, ρ, F ), where • Q is a finite set of states.
20
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
• s0 ∈ Q is the initial state. ra 6=
• ua = ua1 ua2 · · · uara ∈ Σ ra registers of A.
rb6=
• ub = ub1 ub2 · · · ubrb ∈ Σ of rb registers of A.
is the initial assignment to the first tuple of is the initial assignment to the second tuple
• µ ⊆ Q×({1, 2, . . . , ra }∪{ε})×({1, 2, . . . , rb }∪{ε})×Q is the transition relation. The intuition lying behind µ is as follows. Let A be in state p. – If the input symbol is equal to the content of the k th register in the first registers’ tuple, and (p, k, ε, q) ∈ µ, then A may enter state q and pass to the next input symbol. – If the input symbol is equal to the content of the lth register in the second registers’ tuple, and (p, ε, l, q) ∈ µ, then A may enter state q and pass to the next input symbol. – If the input symbol is equal to the contents of the k th and lth registers in the first and second registers’ tuple, respectively, and (p, k, l, q) ∈ µ, then A may enter state q and pass to the next input symbol. – If (p, ε, ε, q) ∈ µ, then A may make a double look-ahead reassignment using the ρ function, i.e., replace the content of the registers indexed by ρ(p, q) and enter state q without reading the next input symbol. • ρ : {(p, q) : (p, ε, ε, q) ∈ µ} → ({1, 2, . . . , ra } ∪ {nil}) × ({1, 2, . . . , rb } ∪ {nil}) is a function called the double look-ahead reassignment. Intuitively, if A is in state p, (p, ε, ε, q) ∈ µ, and ρ(p, q) = (k, l), then for k 6= nil, A can non-deterministically replace the content of the k th register in the first registers’ tuple with an element of Σ not appearing in any other register in the first registers’ tuple and enter state q. Similarly, for l 6= nil, A may replace the content of the lth register in the second registers’ tuple with an element of Σ not appearing in any other register in the second registers’ tuple and enter state q. • F ⊆ Q is the set of final states. 21
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
ra 6=
rb 6=
∗
An instantaneous description of A is a member of Q × Σ × Σ × Σ . The first component of instantaneous description is the (current) state of automaton, the second and third are the assignment consisting of the contents of the first and second set of registers (in the increasing order of their indices), and the fourth component is the portion of the input yet to be read. Next, we define the relation ` (yielding in one step) between two instantaneous descriptions (p, v1a v2a · · · vraa , v1b v2b · · · vrbb , σσ) and (q, w1a w2a · · · wraa , w1b w2b · · · wrbb , σ), σ ∈ Σ ∪ {ε}. We write (p, v1a v2a · · · vraa , v1b v2b · · · vrbb , σσ) ` (q, w1a w2a · · · wraa , w1b w2b · · · wrbb , σ), | {z } | {z } | {z } | {z } va wa vb wb if the following holds. a • If σ = ε and ρ(p, q) = (k, l), then for k 6= nil, wka ∈ Σ \ {v1a , . . . , vk−1 , a a a a a a vk+1 , . . . , vra } and for each m 6= k, wm = vl . Otherwise w = v . b b Similarly, for l 6= nil, wlb ∈ Σ \ {v1b , . . . , vl−1 , vl+1 , . . . , vrbb } and for each b m 6= l, wm = vlb . Otherwise wb = v b .
• If σ 6= ε, then for some k, σ = vka , and (p, k, ε, q) ∈ µ; or for some l, σ = vlb , and (p, ε, l, q) ∈ µ; or for some k and l, σ = vka = vlb , and (p, k, l, q) ∈ µ. ∗
We denote the reflexive and transitive closure of ` by ` and say that A ∗ ∗ accepts a word σ ∈ Σ , if (s0 , ua , ua , σ) ` (f, v a , v b , ε) for some f ∈ F and rb 6= ra 6= va ∈ Σ , vb ∈ Σ . Remark 1 Let A = (Q, s0 , ua , ub , µ, ρ, F ) be an (ra , rb )-register LFMAD . Similarly to Lemma 1, without loss of generality, we may assume that A’s registers can be partitioned into “look-ahead” registers (first ra,l , rb,l registers) and “constant” registers (last ra,c , rb,c registers), where ra,l +ra,c = ra and rb,l + rb,c = rb . Also by introducing new states, if necessary, we may assume that for each p, q ∈ Q, such that (p, ε, ε, q) ∈ µ, ρ(p, q) = (k, nil) or ρ(p, q) = (nil, l) for some k, l, i.e., A does not make any double look-ahead reassignment simultaneously. Indeed, let p, q ∈ Q, such that (p, ε, ε, q) ∈ µ and ρ(p, q) = (k, l) for k, l(6= nil). Then, by introducing a new state [pq] 6∈ Q, extending µ by (p, ε, ε, [pq]) and ([pq], ε, ε, q), and defining ρ(p, [pq]) = (k, nil), ρ([pq], q) = (nil, l), we can split a double look-ahead reassignment, into two single lookahead reassignments. In other words, we may assume that 22
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
• ua ∈ #ra,l · Σ
ra,c 6=
rb,c 6=
• ub ∈ #rb,l · Σ
,
, and
• ρ is a function into ({1, 2, . . . , ra,l } × {nil}) ∪ ({nil} × {1, 2, . . . , rb,l }) ∪ {(nil, nil)}. Theorem 2 A language is look-ahead quasi-regular if and only if it is accepted by an LFMAD . Proof: We start with the proof of the “only if” part of the theorem. Let A = (Q, s0 , u, µ, ρ, f0 ) be an r-register LFMA. We construct an (r, 0)-register LFMAD AD that simulates A. In each stage of the computation, the first registers’ tuple of AD is in one-to-one correspondence with the r registers of A. A formal description of AD is as follows. Let AD = (Q, s0 , u, ε, µD , ρD , F ) be an (r, 0)-register LFMAD , where • µD = {(p, k, ε, q) : (p, k, q) ∈ µ} and • ρD (p, q) = (ρ(p, q), nil). A straightforward induction on the run length, shows that R = (q0 , v 0 , σ 0 ), (q1 , v 1 , σ 1 ), . . . , (qn , v n , ε) is an accepting run of A on σ 0 if and only if RD = (q0 , v 0 , ε, σ 0 ), (q1 , v 1 , ε, σ 1 ), . . . , (qn , v n , ε, ε) is an accepting run of AD on σ 0 . Therefore L(A) = L(AD ). a b D D D Conversely, let AD = (QD , sD 0 , u , u , µ , ρ , F ) be an (ra , rb )-register LFMAD . We construct an (r = ra + rb )-register LFMA A that simulates AD . A double assignment (wa = w1a w2a · · · wraa , wb = w1b w2b · · · wrbb ) of AD is represented by an assignment w = w1 w2 · · · wr of A such that [wa ] ∪ [wb ] ⊆ [w], and a “pointer” function f : {1, 2, . . . , r} → {1, 2, . . . , r} such that (p, m, q) is a transition of A if and only if (p, k, l, q) ∈ µD and f (k) = f (l) = m.4 4
If k or l is ε, we assume that f (k) = m or f (l) = m, respectively, and if both k and l are ε, we assume that m = ε.
23
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
A formal description of A is as follows. Let U = [ua · ub ] = {u1 , u2 , . . ., urc }, where ui 6= uj for i 6= j. We denote by U a , U b , U ab the partitions of U where U ab = [ua ] ∩ [ub ], U a = [ua ] \ U ab , and U b = [ub ] \ U ab . Let sx be the cardinality of U x , x ∈ {a, b, ab}. By Remark 1, without loss of generality, we may assume that ua = #ra,l · U a · U ab and ub = #rb,l · U b · U ab for some ra,l , rb,l . Also we may assume that AD does not make any double look-ahead reassignment simultaneously. Now, let r = ra + rb , Ra = {1, 2, . . . , ra }, and Rb = {ra + 1, ra + 2, . . . , r} be the sets of first ra and second rb indices, respectively, and let R = Ra ∪ Rb . Let Fr denote the set of all (“pointer”) functions f : R → R such that f : Ra → R and f : Rb → R are injective. In other words, no two registers from the same registers’ tuple are pointed to the same register. For two functions f, f 0 ∈ Fr and k ∈ R, we write f∼ =k f 0 if and only if f (i) = f 0 (i), for all i ∈ R \ {k}. Finally, for the register index i ∈ R and a set of registers T ⊆ R, we say i is a T -free register (with respect to f ), if i 6∈ Im(f |T ), i.e., no register from T is pointed to i. Now, A = (Q, s0 , u, µ, ρ, F ) is an r-register LFMA, such that • Q = QD × Fr . • s0 = (sD 0 , f0 ), where f0 is defined by the following table argument 1 2 . . . value 1 2 ...
ra + rb,l + sb ra + rb,l + sb
ra + rb,l + sb + 1 ra,l + sa + 1
... ...
r ra
• u = #ra,l · U a · U ab · #rb,l · U b · #sab . | {z } | {z } ra
rb
• µ = µ1 ∪ µ2 , where – µ1 consists of all triples ((p, f ), m, (q, f )) such that one of the following conditions is satisfied. 1. For some (p, k, l, q) ∈ µD , k, l 6= ε and f (k) = f (ra + l) = m. 2. For some (p, k, ε, q) ∈ µD , k 6= ε and f (k) = m. 3. For some (p, ε, l, q) ∈ µD , l 6= ε and f (ra + l) = m. – µ2 consists of all triples ((p, f ), ε, (q, f 0 )) such that (p, ε, ε, q) ∈ µD and f 0 is as in the definition of ρ below. • ρ((p, f ), (q, f 0 )) = m if and only if either ρD (p, q) = (k, nil) and one of the following conditions is satisfied. 24
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
1. If k = nil, then f 0 = f and m = nil. Otherwise assume k 6= nil below. 2. If j ∈ R is an (Ra \ {k})-free register, then f 0 ∼ =k f , f 0 (k) = j, and m = nil. 3. If m is an (R \ {k})-free register, then f 0 ∼ =k f and f 0 (k) = m; or (symmetrically), ρD (p, q) = (nil, l) and one of the following conditions is satisfied. 1. If l = ε, then f 0 = f and m = nil. Otherwise let l0 = ra + l below. 2. If j ∈ R is an (Rb \ {l0 })-free register, then f 0 ∼ =l0 f , f 0 (l0 ) = j, and m = nil. 3. If m is an (R \ {l0 })-free register, then f 0 ∼ =l0 f , f 0 (l0 ) = m. • F = F D × Fr . We contend that L(A) = L(AD ). Let σ = σ1 σ2 · · · σn ∈ L(AD ) and let RD = (q0 , wa0 , wb0 , σ 0 ), (q1 , wa1 , wb1 , σ 1 ), . . . , (qk , wak , wbk , ε) be an accepting a a b b D run of AD on σ. That is q0 = sD 0 , w 0 = u , w 0 = u , σ 0 = σ and qk ∈ F . 0 0 We transform it into an accepting run R = (q0 , w0 , σ 0 ), (q1 , w1 , σ 1 ), . . . , (qk0 , wk , ε), qi0 = (qi , fi ), i = 0, 1, . . . , k, of A on σ by induction as follows. Let q00 = s0 , w0 = u and assume that fi has been defined such that conditions (1) – (3) below are satisfied. a For all k ∈ Ra : wi,k = wi,fi (k) .
For all l ∈ Rb :
b wi,l−r a
(1)
= wi,fi (l) .
For all k ∈ Ra , l ∈ Rb :
a wi,k
=
b wi,l−r a
(2) if and only if fi (k) = fi (l). (3)
Note that, [wai ] ∪ [wbi ] ⊆ [wi ] follows from above, and, by the definition of q00 and w0 , these conditions are satisfied for i = 0. Let i ≥ 0 and let (qi , wai , wbi , σσ i+1 ) ` (qi+1 , wai+1 , wbi+1 , σ i+1 ). We shall distinguish between the cases of σ 6= ε and σ = ε. a b Let σ 6= ε and assume that for some k, l 6= ε, σ = wi,k = wi,l−r and a b b a D a (qi , k, l−ra , qi+1 ) ∈ µ , wi+1 = wi , wi+1 = wi . By the induction hypothesis, conditions (1) – (3) are satisfied, implying σ = wi,fi (k) = wi,fi (l) , fi (k) = fi (l), and, by the definition of µ, ((qi , fi ), fi (k), (qi+1 , fi )) ∈ µ. Thus, 0 (qi0 , wi , σσ i+1 ) `σ (qi+1 , wi+1 , σ i+1 )
25
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
0 with qi+1 = (qi+1 , fi ) and wi+1 = wi , i.e., the induction hypothesis holds for i + 1. The proof of the case in which one of k, l is ε is similar to the above and is omitted. a Let σ = ε, (qi , ε, ε, qi+1 ) ∈ µD , ρD (qi , qi+1 ) = (k, nil), wi+1,k = δ ∈ Σ\ a a a a a a {wi,1 , . . . , wi,k−1 , wi,k+1 , . . . , wi,ra }, and for each l 6= k, wi+1,l = wi,l . Assume δ ∈ [wi ] and let δ = wi,j for some j ∈ R. Note that, since a δ 6∈ ([wai ] \ {wi,k }), j is an (Ra \ {k})-free register, (otherwise, we would have a })). Let fi+1 ∼ a contradiction with δ 6∈ ([wai ] \ {wi,k =k fi and fi+1 (k) = j. By the definition of µ and ρ, ρ((qi , fi ), (qi+1 , fi+1 )) = nil. Thus, 0 (qi0 , wi , σ i+1 ) `ε (qi+1 , wi+1 , σ i+1 ) 0 with qi+1 = (qi+1 , fi+1 ) and wi+1 = wi , i.e., the induction hypothesis holds for i + 1. Now, assume δ 6∈ [wi ] and let m be an (R \ {k})-free register. Note, that such a register always exists, because Im(f |R\{k} ) ( R. Let fi+1 ∼ =k fi and fi+1 (k) = m. By the definition of µ and ρ, ρ((qi , fi ), (qi+1 , fi+1 )) = m. Thus, 0 (qi0 , wi , σ i+1 ) `ε (qi+1 , wi+1 , σ i+1 ) 0 with qi+1 = (qi+1 , fi+1 ) and wi+1 = wi,1 wi,2 · · · wi,m−1 δwi,m+1 · · · wi,r , i.e., the induction hypothesis holds for i + 1,5 which completes the proof of inclusion L(AD ) ⊆ L(A). Conversely, let σ ∈ L(A) and let R = (q00 , w0 , σ 0 ), (q10 , w1 , σ 1 ), . . . , (qk0 , wk , ε), qi0 = (qi , fi ), i = 0, 1, . . . , k, be an accepting run of A on σ. That is q00 = s0 , w0 = u, σ 0 = σ and qk0 ∈ F . We transform it into an accepting run RD = (q0 , wa0 , wb0 , σ 0 ), (q1 , wa1 , wb1 , σ 1 ), . . . , (qk , wak , wbk , ε) of AD on σ by induction as follows. Assume that fi has been defined such that conditions (1) – (3) above are satisfied. Note that, by the definition of q00 and w0 , these conditions are satisfied for i = 0. Let i ≥ 0 and let 0 (qi0 , wi , σσ i+1 ) ` (qi+1 , wi+1 , σ i+1 ). As above, we shall distinguish between the cases of σ 6= ε and σ = ε. 0 Let σ 6= ε. Then, for some m, σ = wi,m and (qi0 , m, qi+1 ) ∈ µ, wi+1 = wi . Assume that we are in the first case of the definition of µ1 , i.e., for some k, l 6= ε, (p, k, l − ra , q) ∈ µD and f (k) = f (l) = m. By the induction a b hypothesis, conditions (1) – (3) are satisfied, implying σ = wi,k = wi,l−r . a Thus, (qi , wai , wbi , σσ i+1 ) `σ (qi+1 , wai+1 , wbi+1 , σ i+1 ) 5
For the case of ρD (qi , qi+1 ) = (nil, l) the proof is symmetric.
26
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
with wai+1 = wai and wbi+1 = wbi , i.e., the induction hypothesis holds for i + 1. The treatment of the second and the third cases of the definition of µ1 is similar and is omitted. 0 0 Let σ = ε, (qi0 , ε, qi+1 ) ∈ µ, ρ(qi0 , qi+1 ) = m, and for some k, ρD (p, q) = (k, nil). Assume that we are in the first case of the definition of ρ, i.e., m = k = nil, and, since A can make an ε-move if and only if AD can make an ε-move the induction hypothesis holds for i + 1. Now, assume that we are in the second case of the definition of ρ, i.e., m = nil, fi+1 ∼ =k fi and fi+1 (k) = j for some (Ra \ {k})-free register j ∈ R. Let δ = wi,j , then, by the induction hypothesis and definition of j, δ 6∈ a ([wai ] \ {wi,k }). Thus, (qi , wai , wbi , σ i+1 ) `ε (qi+1 , wai+1 , wbi+1 , σ i+1 ) a a a a a with wai+1 = wi,1 wi,2 · · · wi,k−1 δwi,k+1 · · · wi,r and wbi+1 = wbi , i.e., the induca tion hypothesis holds for i + 1. Finally, assume that we are in the third case of the definition of ρ, i.e., m is an (R \ {k})-free register, fi+1 ∼ =k fi and fi+1 (k) = m. Then, wi+1,m = δ ∈ Σ \ {wi,1 , . . . , wi,m−1 , wi,m+1 , . . . , wi,r } and for each l 6= m, wi+1,l = wi,l . Then, by the induction hypothesis and definition of m, δ 6∈ [wai ]. Thus,
(qi , wai , wbi , σ i+1 ) `ε (qi+1 , wai+1 , wbi+1 , σ i+1 ) a a a a a with wai+1 = wi,1 wi,2 · · · wi,k−1 δwi,k+1 · · · wi,r and wbi+1 = wbi , i.e., the induca tion hypothesis holds for i + 1.6 This proves the inclusion L(A) ⊆ L(AD ) and completes the proof of the theorem. 2
Now, in view of Theorem 2, when dealing with closure properties we may consider LFMAD . Theorem 3 The look-ahead quasi-regular languages are closed under intersection, union, concatenation, and the Kleene star. Proof: Let Aa = (Qa , sa0 , ua , µa , ρa , F a ) and Ab = (Qb , sb0 , ub , µb , ρb , F b ) be an ra -register and rb -register LFMA, respectively. Without loss of generality, we may assume that Qa and Qb are disjoint. Let s0 be a new state (not in Qa or Qb ). 6
For the case of ρD (p, q) = (nil, l) the proof is symmetric.
27
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Closure under intersection. The proof is based on the standard product construction. Let A∩ = (Qa × Qb , (sa0 , sb0 ), ua , ub , µ∩ , ρ∩ , F a × F b ) be an (ra , rb )-register LFMAD , such that • µ∩ = µ1 ∪ µ2 ∪ µ3 , where − µ1 = {((pa , pb ), k, l, (q a , q b )) : (pa , k, q a ) ∈ µa , (pb , l, q b ) ∈ µb }, − µ2 = {((pa , pb ), ε, ε, (q a , pb )) : (pa , ε, q a ) ∈ µa , pb ∈ Qb }, and − µ3 = {((pa , pb ), ε, ε, (pa , q b )) : (pb , ε, q b ) ∈ µb , pa ∈ Qa }. • ρ∩ ((pa , pb ), (q a , q b )) = (ρa (pa , q a ), ρb (pb , q b )).7 The automaton A∩ simultaneously simulates Aa using the first registers’ tuple and simulates Ab using the second registers’ tuple. Thus, L(Aa ) ∩ L(Ab ) = L(A∩ ). Closure under union. Let A∪ = ({s0 }∪Qa ∪Qb , s0 , ua , ub , µ∪ , ρ∪ , F a ∪F b ) be an (ra , rb )-register LFMAD , such that • µ∪ = µ1 ∪ µ2 ∪ µ3 , where − µ1 = {(s0 , ε, ε, sa0 ), (s0 , ε, ε, sb0 )}, − µ2 = {(pa , k, ε, q a ) : (pa , k, q a ) ∈ µa }, and − µ3 = {(pb , ε, l, q b ) : (pb , l, q b ) ∈ µb }. • ρ∪ ((s0 , s0 ), (sa0 , s0 )) = ρ∪ ((s0 , s0 ), (s0 , sb0 )) = (nil, nil), ρ∪ ((pa , s0 ), (q a , s0 )) = (ρa (pa , q a ), nil), for (pa , ε, q a ) ∈ µa , and ρ∪ ((s0 , pb ), (s0 , q b )) = (nil, ρb (pb , q b )), for (pb , ε, q b ) ∈ µb . The automaton A∪ nondeterministically chooses to enter either the initial state of Aa or the initial state of Ab , and thereafter, A∪ simulates Aa or Ab respectively. Thus, L(Aa ) ∪ L(Ab ) = L(A∪ ). Closure under concatenation. Let A• = (Qa ∪ Qb , sa0 , ua , ub , µ• , ρ• , F b ) be an (ra , rb )-register LFMAD , such that • µ• = µ1 ∪ µ2 ∪ µ3 , where − µ1 = {(pa , k, ε, q a ) : (pa , k, q a ) ∈ µa }, − µ2 = {(pa , ε, ε, sb0 ) : pa ∈ F a }, and − µ3 = {(pb , ε, l, q b ) : (pb , l, q b ) ∈ µb }. • ρ• (pa , q a ) = (ρa (pa , q a ), nil), for (pa , ε, q a ) ∈ µa , ρ• (pa , sb0 ) = (nil, nil), for pa ∈ F a , and ρ• (pb , q b ) = (nil, ρb (pb , q b )), for (pb , ε, q b ) ∈ µb . 7
If ρx (px , q x ) is undefined, we assume that ρx (px , q x ) = nil, for x ∈ {a, b}.
28
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
The automaton A• simulates Aa for a while, and then nondeterministically moves from a final state of Aa to the initial state of Ab , from which it simulates Ab . Thus, L(Aa ) · L(Ab ) = L(A• ). Closure under the Kleene star. By Lemma 1, we may assume that A = (Q, s0 , u, µ, ρ, F ) is an r-register LFMA, where u = #rl · uc and rl + rc = r. Note that, if σ ∈ L(A), then for any ul ∈ (Σ \ ([uc ] ∪ [σ]))rl6= A has an accepting run from instantaneous description (s0 , ul uc , σ) as well. Let ˜ = {˜ Q q1 , q˜2 , . . . , q˜rl } be a set of rl states disjoint with Q and let A∗ = ˜ q˜1 , u, µ∗ , ρ∗ , F ∪ {˜ (Q ∪ Q, q1 }) be an r-register LFMA, such that • µ∗ = µ ∪ µ1 ∪ µ2 , where − µ1 = {(f, ε, q˜1 ) : f ∈ F } and − µ2 = {(˜ qi , ε, q˜i+1 ) : i = 1, 2, . . . , rl − 1} ∪ {(˜ qrl , ε, s0 )}. • ρ∗ (p, q) = ρ(p, q), for (p, ε, q) ∈ µ, ρ∗ (f, q˜1 ) = ε, for f ∈ F , ρ∗ (˜ qi , q˜i+1 ) = i, for i = 1, 2, . . . , rl − 1, and ρ∗ (˜ qrl , s0 ) = rl . A straightforward inspection of A∗ shows that, if σ ∈ L(A∗ ), then either σ = ε or σ = σ 1 σ 2 · · · σ k forr some k ≥ 1, where for i = 1, 2, . . . , k, there l6= is an fi ∈ F and wi , wi+1 ∈ Σ such that (s0 , wi uc , σ i ) `∗A (fi , wi+1 uc , ε). Hence w ∈ L(A)∗ . On the other hand, if σ ∈ L(A)∗ , then either σ = ε or σ = σ 1 σ 2 · · · σ k for some σ 1 , σ 2 , . . . , σ k ∈ L(A). In the former case, σ ∈ L(A∗ ) since q˜1 is r6= a final state. In the latter case, for i = 1, 2, . . . , k, there are wi ∈ Σ and fi ∈ F such that (s0 , u, σ i ) `∗A (fi , wi , ε). Then, (˜ q1 , u, σ 1 σ 2 · · · σ k ) `rl (s0 , u1 uc , σ 1 σ 2 · · · σ k ) `∗ (f1 , w1 , σ 2 σ 3 · · · σ k ) `ε (˜ q1 , w1 , σ 2 σ 3 · · · σ k ) `rl (s0 , u2 uc , σ 2 σ 2 · · · σ k ) `∗ (f2 , w2 , σ 3 σ 4 · · · σ k ) `ε (˜ q1 , w2 , σ 3 σ 4 · · · σ k ) `rl . . . `∗ (fk , wk , ε) with ui ∈ (Σ \ (uc ∪ [σ i ]))rl6= , i = 1, 2, . . . , k, is an accepting run of A∗ on σ. Thus, L(A)∗ = L(A∗ ), which completes the proof of the theorem. 2 Next we show that look-ahead quasi-regular languages are closed under reversing.
29
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Definition 6 Let σ = σ1 σ2 · · · σn be a word over Σ, the reversal σ R of σ ∗ is σ R = σn σn−1 · · · σ1 , and for a language L ⊆ Σ , the reversal LR of L is LR = {σ R : σ ∈ L}. Theorem 4 The look-ahead quasi-regular languages are closed under reversing. Proof: By Lemma 1, we may assume that A = (Q, s0 , u, µ, ρ, F ) is an r-register LFMA, where u = #rl · uc and rl + rc = r. In addition we may assume that on any run of A a comparison with a “look-ahead” register is done after a look-ahead reassignment has been made. Indeed, analyzing all paths in the graph representation of A from initial state s0 which with an end final state f ∈ F , we can “dead-end” (by introducing a new state) such a comparison. ˜ = {˜ Let Q q1 , q˜2 , . . . , q˜rl } be a set of rl states disjoint with Q and let R ˜ A = (Q ∪ Q, q˜1 , u, µR , ρR , {s0 }) be an r-register LFMA, such that • µR = µ1 ∪ µ2 ∪ µ3 , where − µ1 = {(˜ qi , ε, q˜i+1 ) : i = 1, 2, . . . , rl − 1}, − µ2 = {(˜ qrl , ε, f ) : f ∈ F }, and − µ3 = {(p, k, q) : (q, k, p) ∈ µ}. qi , q˜i+1 ) = i, for i = 1, 2, . . . , rl − 1, • ρR (˜ R ρ (˜ qrl , f ) = rl , for f ∈ F , and ρR (p, q) = ρ(q, p), for (q, ε, p) ∈ µ. We intend to prove that L(AR ) = (L(A))R . Let σ = σ1 σ2 · · · σn ∈ L(A) and let R = (q0 , w0 , σ 0 ), (q1 , w1 , σ 1 ), . . . , (qk , wk , ε) be an accepting run of A on σ. That is q0 = s0 , w0 = u, σ 0 = σ, and qk ∈ F . We transform it into an ˜ 1 , σ R ), (˜ ˜ 2 , σ R ), . . . , (˜ ˜ rl , σ R ), (q00 , w00 , σ 00 ), accepting run RR = (˜ q1 , w q2 , w qrl , w R 0 0 0 0 0 R (q1 , w1 , σ 1 ), . . . , (qk , wk , ε), of A on σ by induction as follows. Let (q00 , w00 , σ 00 ) = (qk , wk , σ R ) and assume that qi0 , w0i and σ 0i have been defined such that conditions (1) – (3) below are satisfied. qi0 = qk−i . w0i = wk−i . σ 0R i σ k−i = σ.
(1) (2) (3)
Note that, by definition, AR can reach instantaneous description (qk , wk , σ R ) after first rl moves, and by the definition of q00 , w00 and σ 00 , these conditions are 30
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
satisfied for i = 0. Let (qk−(i+1) , wk−(i+1) , σσ k−(i+1) ) `A (qk−i , wk−i , σ k−i ). We shall distinguish between the cases of σ = ε and σ 6= ε. Let σ = ε. If ρ(qk−(i+1) , qk−i ) = j, it follows that (qk−(i+1) , ε, qk−i ) ∈ µ, wk−i,j ∈ Σ \ {wk−(i+1),1 , . . . , wk−(i+1),j−1 , wk−(i+1),j+1 , . . . , wk−(i+1),r }, and for each l 6= j, wk−i,l = wk−(i+1),l , and σ k−i = σ k−(i+1) . By the induction hypothesis, qi0 = qk−i , w0i = wk−i and σ 0R i σ k−i = σ, and, by the definition of R R 0 0 R R 0 0 µ and ρ , (qi , ε, qi+1 ) ∈ µ and ρ (qi , qi+1 ) = j. Thus, 0 , w0i+1 , σ 0i+1 ) (qi0 , w0i , σ 0i ) `ε (qi+1 0 with qi+1 = qk−(i+1) , 0 wi+1,j ∈ = = 3
0 0 0 0 , wi,j+1 , . . . , wi,r } , . . . , wi,j−1 Σ \ {wi,1 Σ \ {wk−i,1 , . . . , wk−i,j−1 , wk−i,j+1 , . . . , wk−i,r } Σ \ {wk−(i+1),1 , . . . , wk−(i+1),j−1 , wk−(i+1),j+1 , . . . , wk−(i+1),r } wk−(i+1),j ,
0 for all l 6= j, wi+1,l = wk−i,l = wk−(i+1),l , and 0R σ 0R i+1 σ k−(i+1) = σ i σ k−i = σ,
i.e., the induction hypothesis holds for i + 1. Let σ 6= ε. Then, for some j, σ = wk−(i+1),j , (qk−(i+1) , j, qk−i ) ∈ µ, wk−i = wk−(i+1) , and σ k−i = σσ k−(i+1) . By the induction hypothesis, qi0 = qk−i , R 0 0 R w0i = wk−i and σ 0R i σ k−i = σ, and, by the definition of µ , (qi , j, qi+1 ) ∈ µ . Thus, 0 (qi0 , w0i , σσ 0i ) `σ (qi+1 , w0i+1 , σ 0i+1 ) 0 with qi+1 = qk−(i+1) , w0i+1 = wk−(i+1) , and R
0 0R 0R σ 0R i+1 σ k−(i+1) = (σσ i ) σ k−(i+1) = σ i σσ k−(i+1) = σ i σ k−i = σ,
i.e., the induction hypothesis holds for i + 1. Thus, for i = k, qk0 = q0 = f0R , w0k = w0 , and 0 0R σ 0R k σ 0 = σ k σ = σ ⇒ σ k = ε,
i.e., σ R ∈ L(AR ), which proves the inclusion (L(A))R ⊆ L(AR ). R ˜ we have L(AR ) ⊆ (L(A))R . Conversely, since (AR ) = A “modulo Q”, This completes the proof of the equality L(AR ) = (L(A))R . 2 31
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Example 5 Let L = {σ1 σ2 · · · σn : σi 6= σ1 , 1 < i ≤ n}. That is, L consists of all words whose first symbol differs from all the others. The reversal LR is a language of the form LR = {σ1 σ2 · · · σn : σi 6= σn , 1 ≤ i < n}. That is, LR consists of all words whose last symbol differs from all the others. We have seen that L is quasi-regular, while LR is not, see [7, Example 4]. By Theorem 4, LR is a look-ahead quasi-regular and in Example 3 we showed that LR is accepted by the look-ahead finite-memory automaton. Corollary 3 LQR ( LLQR . Proof: The proof follows from Theorem 1 and Example 5 above.
2
We conclude this section with one more example, that naturally leads us to a difficult question whether LLQR ⊆ L2W F M A . Example 6 Consider a language Ls that consists of all words whose contains a symbol (separator ) which does not appear in the first or the last positions and it is different from all other symbols. That is, Ls = {uσv : |u|, |v| ≥ 1, σ 6∈ [u] ∪ [v]}. It could be readily seen that Ls is accepted by an 2-register LFMA As shown in Figure 2.4. Therefore, Ls is a look-ahead quasi-regular. We contend that Ls is not quasi-regular. To prove our contention, assume to the contrary that Ls is accepted by an r-register FMA A. Let x = u1 u1 u2 u2 . . . ur ur ur+1 ur+1 and y = σv1 v1 , where all symbols u1 , u2 , . . . , ur+1 , σ and v1 are pairwise different. Then xy ∈ Ls (= L(A)) with symbol σ being a separator. Let Σ0 be a subset of [x] provided by the indistinguishability property of FMA, see Proposition 1. Since the cardinality of Σ0 does not exceed r, there exists an i ∈ {1, 2, . . . , r + 1} such that ui 6∈ Σ0 . Since [x] ∩ [y] = ∅, it follows that ui 6∈ [y] ∪ Σ0 . By Proposition 1, x(y(σ|ui )) ∈ L(A). However in the last word there is no valid separator, since each symbol appears twice or more in the word. This contradicts the assumption L(A) = Ls . 32
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
ε:1
s
ε:2
q0
ε:1
1
q1
1
2
q2
1
f
1
u:
# #
regitser 2 contains σ
register 1 is temporary
Figure 2.4: An LFMA that accepts Ls = {uσv : |u|, |v| ≥ 1, σ 6∈ [u] ∪ [v]} ∗
Let L = Ls . By Theorem 3, L is a look-ahead quasi-regular too. We shall prove that L is accepted by a two-way non-deterministic 5register automaton A depicted below. This automaton works as follows. First A moves to the right, non-deterministically picks up a symbol from the input word, remembers the chosen symbol (separator) in the first register, returns to the left end marker ¤ and moves one step to the right. Then A enters the following cycle (in which it makes right moves only). 1. It checks the input symbol. • If A “sees” the right end marker ¢, then it accepts. • If A “sees” the symbol (separator) stored in the first register, then it “rejects”. Otherwise, A proceeds as follows. 2. It moves right till the first appearance of the symbol (separator) stored in the first register, passes it, verifies that next symbol is different from separator, and moves right on symbols different from separator. 3. On its way to the right A non-deterministically picks up another sym-
33
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
bol8 from the input word and remembers it in the second or third register. 4. After stage 3, A may (non-deterministically) “switch” between the first, second and third registers. i.e., choose as the next separator symbol stored in the second or third register.9 Alternatively, A can be described by the diagram below, where each Atx,y,z initialization 1L , 3L , 4L 1 R
s, 1
1L
q0 , 3
4R
Aa1,2,3
Ab2,1,3
Ac3,1,2
u:
# # # 1 2 3 4 5 ∗
Figure 2.5: A 2WFMA that accepts Ls component of A, t ∈ {a, b, c}, is appears in Figure 2.6. We show first that L(A) ⊆ L. Let σ ∈ L(A). Then σ = σ1σ2 · · · σn,
(4)
where σ i , i = 1, 2, . . . , n is the subword of the input covered in the ith iteration, given by stage 2 of the above description of A. Then σ i ∈ Ls , i = 1, 2, . . . , n, and the desired inclusion follows from (4). For the proof of inclusion L ⊆ L(A) it suffices to show that each σ ∈ L can be partitioned as σ = (u1 σ1 v 1 )(u2 σ2 v 2 ) · · · (un σn v n ), where 8
This symbol may appear in the first or the last positions of the pattern that A “covers” by this sequence of moves, i.e., in the positions in which A starts and ends the iteration. 9 This is done by entering the state that remembers which of the registers is the first, which is the second, and which is the third.
34
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Atx,y,z f y
{x, y, z} = {1, 2, 3}
yR , zR
yR , zR
q2t , z
q4t , z
R
5R in
q1t , y xR
R
y ,z
R
yR , zR
out
yR , zR xR
yR zR q3t , z
xR R
x
yR , zR x is the current separtor and one of y or z is the next separtor
Figure 2.6: The diagram of the Atx,y,z component of A • ui σi v i ∈ Ls , i = 1, 2, . . . , n, and • σi+1 ∈ [ui σi v i ], i = 1, 2, . . . , n − 1. Then A can pick up σ1 before entering the loop, and pick up σi+1 in the ith iteration. So, let n be the minimal integer for which there exist ui σi v i ∈ Ls , i = 1, 2, . . . , n, such that σ = (u1 σ1 v 1 )(u2 σ2 v 2 ) · · · (un σn v n ), and assume to the contrary that for some i = 1, 2, . . . , n − 1, σi+1 6∈ [ui σi v i ]. Then (ui σi v i ui+1 )σi+1 v i+1 ∈ L, implying that (u1 σ1 v 1 )(u2 σ2 v 2 ) · · · (ui σi v i ui+1 σi+1 v i+1 ) · · · (un σn v n ) is a “shorter” partition of σ, which contradicts the definition of n.
35
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
3
Regular expressions for LQR languages (LQRE)
In this section we introduce an alternative description of LQR languages by the so called look-ahead quasi-regular expressions (LQRE), which are the infinite alphabet counterpart of the ordinary regular expressions.
3.1
The definition of LQRE
Definition 7 Let X = {x1 , x2 , . . . , xr } be a set of variables such that X ∩ Σ = ∅ and let Θ be a finite subset of Σ. Look-ahead quasi-regular expressions over (X, Θ), or shortly LQRE, if (X, Θ) is understood from the context, are defined as follows. • ∅, ε, and each element of X ∪ Θ are LQRE.10 • If α1 and α2 are LQRE, then so are (α1 + α2 ) and (α1 · α2 ). • If x ∈ X, α1 and α2 are LQRE, then so is (α1 ·x α2 ). • If α is an LQRE, then so is (α)∗ . • If x ∈ X and α is an LQRE, then so is (α)∗x . Remark 2 An LQRE α over (X, Θ) can be thought of as an ordinary regular b = X ∪ Θ ∪ {·x : x ∈ X}, that we shall expression over a finite alphabet Σ denote by α b. That is, the ordinary regular expression α b is obtained from the corresponding LQRE α by replacing its each subexpression of the form (α1 ·x α2 ) with (α1 · ·x · α2 ) and each subexpression of the form (α)∗x with (α · ·x )∗ and vice versa. See the formal definition below.
3.2
Languages defined by instances of LQRE
In this section, we define the language I(α) generated by an LQRE α. The language we define is based on the notion of an instance of a regular language defined by a regular expression associated to an LQRE. The method we use is similar to the one used on FSUBA languages, see [8, 9]. 10
Of course ε is redundant (but still, very usefull), because L(∅∗ ) = {ε}.
36
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Definition 8 With an LQRE α over (X, Θ), we associate a regular expresb that is defined recursively, as follows. sion α b over Σ • If α ∈ {∅, ε} ∪ Θ ∪ X, then α b = α. • If α = α1 ¤ α2 for some ¤ ∈ {+, ·}, then α b=α b1 ¤ α b2 . • If α = α1 ·x α2 for some x ∈ X, then α b=α b 1 · ·x · α b2 . • If α = (α1 )∗ , then α b = (b α1 )∗ . • If α = (α1 )∗x , then α b = (b α1 · ·x )∗ . b we associate an LQRE α And vice versa, with a regular expression α b over Σ over (X, Θ), that is defined recursively, as follows. • If α b ∈ {∅, ε} ∪ Θ ∪ X, then α = α b. • If α b = ·x for some x ∈ X, then α = ε ·x ε. b=α b1 ¤ α b2 for some ¤ ∈ {+, ·}, then α = α1 ¤ α2 . • If α • If α b = (b α1 )∗ , then α = (α1 )∗ . α),11 i ∈ {1, 2, . . . , n}, and let Definition 9 Let w = w1 w2 · · · wn ∈ LΣb (b wi = x ∈ X. Let j be the maximal integer less than i such that wj = ·x , if such integer exists, and j = 0, otherwise. Let k be the minimal integer greater than i such that wk = ·x , if such integer exists, and k = n + 1, otherwise. A pair (j, k) is called the scope of i and is denoted by Si . That is, ·x ·x x w1 w2 · · · wj · · · wi · · · wk · · · wn | {z } Si =(j,k)
Figure 3.1: The scope of the ith symbol in w Now, with the ith symbol wi of w, i = 1, 2, . . . , n, we associate a symbol wi ∈ Σ ∪ {ε} in the following manner. b defined by α Here and hereafter, LΣ α) denotes the language over alphabet Σ b, see b (b Remark 2 on p. 36. 11
37
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
• If wi is of the form ·x , where x ∈ X, then wi = ε. • If wi ∈ Θ, then wi = wi . • If wi = x ∈ X, then wi can be any element of Σ \ Θ such that the following condition is satisfied. – Let Si be the scope of i. For any scope Sj that overlaps Si , wi = wj if and only if wi = wj . The word w = w1 w2 · · · wn , where wi is as defined above, i = 1, 2, . . . n, is called an instance of w. We denote by I(w) the set of all instances of w. Next we define the language I(α) generated by an LQRE α. Definition 10 Let α be an LQRE over (X, Θ). The language I(α) generated by α is defined by [ I(w). I(α) = I(LΣb (b α)) = w∈LΣ b) b (α
That is, the language I(α) is a set of all instances of all elements of LΣb (b α). The set of all languages generated by look-ahead quasi-regular expressions is denoted by LLQRE . Example 7 Consider an LQRE α = ε ·x (ε ·y y)∗ · x over ({x, y}, ∅). It can be easily seen that I(α) consists of all words over Σ whose last symbol is different from all other symbols, see Example 3. Example 8 Consider an LQRE α = ε ·x (ε ·y y)∗ · x · (ε ·y y)∗ over ({x, y}, ∅). It can be easily seen that I(α) consists of all words over Σ that contains a symbol that appears at most one time.
3.3
Equivalence of the languages defined by LQRE and LFMA
For the proof of the equivalence we shall use the following model of an ordinary finite automata, see [5, Definition 2.2.1].
38
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Definition 11 A (non-deterministic) finite state automata (FA) over a finite b is a system M = (Q, s0 , F, ∆), where alphabet Σ • Q is a finite set of states. • s0 ∈ Q is the initial state. • F ⊆ Q is the set of final states. b ∪ {ε}) × Q called the transition relation. • ∆ is a subset of Q × (Σ b i = 1, 2, . . . , n, over alphabet Σ b is A word w = w1 w2 · · · wn , wi ∈ Σ, accepted by M , if there is a sequence of states s0 , s1 , . . . , sk ∈ Q such that sk ∈ F and (s0 , w) `k (sk , ε), where (si , σσ) ` (si+1 , σ) if and only if (si , σ, si+1 ) ∈ ∆. Theorem 5 A language is defined by an LQRE if and only if it is accepted by an LFMA. In other words LLQR = LLQRE . Proof: The proof of the “only if” part of the theorem is based on the equivalence of ordinary regular expression to FA and the translation scheme LQRE α → RE α b → FA M → LFMA A. Let α be an LQRE over (X, Θ), where X = {x1 , x2 , . . . , xrl } and Θ = {θ1 , θ2 , . . . , θrc }. By Remark 2, α can be thought of as an ordinary regular b = X ∪ Θ ∪ {·x : x ∈ X}, that we shall expression over (finite) alphabet Σ b such denote by α b. Let M = (Q, s0 , F, ∆) be a finite state automaton over Σ that LΣb (b α) = L(M ). Consider an r-register LFMA A = (Q, s0 , u, µ, ρ, F ), such that • r = rl + rc . • u = ul uc , where ul = #rl and uc = θ1 θ2 · · · θrc . • µ = µ1 ∪ µ2 ∪ µ3 , where − µ1 = {(p, k, q) : (p, xk , q) ∈ ∆}, − µ2 = {(p, rl + k, q) : (p, θk , q) ∈ ∆}, and − µ3 = {(p, ε, q) : (p, ·xk , q) ∈ ∆ or (p, ε, q) ∈ ∆}. • ρ(p, q) = k if and only if (p, ·xk , q) ∈ ∆, and ρ(p, q) = nil if and only if (p, ε, q) ∈ ∆. 39
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Remark 3 Note that the diagrams of M and A differ only in the transition labels which can be recovered each from other in a straightforward manner. The proof of the “only if” part of Theorem 5 is based on the fact that the set of the sequences of labels of the accepting paths M is regular. Namely, the “only if” part of Theorem 5 immediately follows from Theorem 6 below. The proof of the “if” part of the theorem is based on a tight relationship between LFMA and ordinary FA, and the translation scheme LFMA A → FA M → RE α b → LQRE α. Let A = (Q, s0 , u, µ, ρ, F ) be an r-register LFMA. By Lemma 1, we may assume that u = ul uc , where ul = #rl , uc = θ1 θ2 · · · θrc , and rl + rc = r. Let X = {x1 , x2 , . . . , xrl } be a set of variables such that X ∩ Σ = ∅ and let Θ = {θ1 , θ2 , . . . , θrc }. b =X ∪Θ∪ Consider a finite state automaton M = (Q, s0 , F, ∆) over Σ {·x : x ∈ X}, such that • ∆ = ∆1 ∪ ∆2 ∪ ∆3 ∪ ∆4 , where − ∆1 = {(p, xk , q) : (p, k, q) ∈ µ and 1 ≤ k ≤ rl }, − ∆2 = {(p, θk−rl , q) : (p, k, q) ∈ µ and rl < k ≤ r}, − ∆3 = {(p, ·xk , q) : (p, ε, q) ∈ µ and ρ(p, q) = k}, and − ∆4 = {(p, ε, q) : (p, ε, q) ∈ µ and ρ(p, q) = nil}. Remark 4 Note that the diagrams of A and M differ only in the transition labels which can be recovered each from other in a straightforward manner. b such that L b (b Let α b be an ordinary regular expression over Σ Σ α) = L(M ). By Remark 2, α b can be thought of as an LQRE over (X, Θ) that we shall denote by α. The proof of the “if” part of Theorem 5 is based on the fact that the set of the sequences of labels of the accepting paths M is regular. Namely, the “if” part of Theorem 5 immediately follows from Theorem 6 below. 2 Theorem 6 Let A and M be as above12 and let α b be an ordinary regular expression that defines L(M ). Then I(α) = L(A). 12
In the proof of the “only if” part of Theorem 5 A is constructed from M , while in the “if” part M is constructed from A.
40
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
α) = L(M ), it suffices to show that L(A) consists of Proof: Since LΣb (b all instances of the elements of L(M ). Let p = e1 e2 . . . en be a path of edges in the graph representation of A. One can think of p as the diagram of an LFMA, also denoted by p. Then L(p) consists of all words of length less than or equal to n over Σ which “drive A through p from its first to the last vertex (state)”. Let P denote the set of all paths p starting from the initial state s0 and ending at a final state. Then [ L(A) = L(p). p∈P
On the other hand, by Remarks 3 and 4,13 p has the corresponding path in M , also denoted by p, that differs from it only in the transition labels. The labels of p in the graph representation of M form an LQRE that we shall denote by wp . Therefore, L(M ) = {wp : p ∈ P }. Consequently the equality of L(A) to the set of all instances of the elements of L(M ) will follow if we proof that for each path p (not necessarily in P ), I(wp ) = L(p), i.e., L(p) consists of all instances of wp . The proof is by the induction on the length n of p. The case of n = 0 is immediate, because the only instance of ε is ε. For the induction step, we assume that the equivalence I(wp ) = L(p), holds for all paths p of length n and shall show that it holds for all paths p0 of length n + 1. Let p be a path of length n, p0 = p, (p, k, q) be a path of length n + 1, and let wp0 = wp · wn+1 . We shall distinguish between the cases of k = ε and k 6= ε. Let k = ε. Then, by the definition of M , wn+1 = ·xk . Therefore, L(p0 ) = L(p){ε} and, by the definition of the language defined by an LQRE, I(wp0 ) = I(wp ){ε}. Since, by the induction hypothesis, I(wp ) = L(p), I(wp0 ) = L(p0 ) follows. 13
The proof of the “only if” part is by Remark 3, whereas the proof of the “if” part is by Remark 4.
41
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Let k 6= ε. If rl < k ≤ r, then, by the definition of u, wn+1 = θk−rl . Therefore, L(p0 ) = L(p){θk−rl } and, by the definition of the language defined by an LQRE, I(wp0 ) = I(wp ){θk−rl }. Since, by the induction hypothesis, I(wp ) = L(p), I(wp0 ) = L(p0 ) follows. If 1 ≤ k ≤ rl , then, by the definition of M , wn+1 = xk . Let i to be the greatest integer less than n+1 such that wi is xk , if such integer exists, and be 0, otherwise. That is, i is the last time register k appears in p. Also, let j be the greatest integer less than n + 1 such that wj = ·xk , if such integer exists, and be 0, otherwise. That is, j is the last time the register k is look-ahead reassigned in p. We shall distinguish between the cases of j ≥ i and j < i. Assume first that j ≥ i. By the definition of an instance of a word defined by an LQRE, the symbol assigned to xk in the (n + 1)th position within the scope Sn+1 = (j, n+2) must be different from the symbol assigned to y(6= xk ) in the i0 th position, for any i0 ∈ {1, 2, . . . , n} such that the scope Si0 = (j 0 , k 0 ) overlaps the scope Sn+1 . Let Σ1 denote the set of all assignments to y within all the scopes Si0 = (j 0 , k 0 ) with j 0 < j. Let Σ2 denote the set of all assignments to y within all the scopes Si0 = (j 0 , k 0 ) with j 0 > j. Then the set of all instances of the word defined by wp0 is I(wp )(Σ \ (Σ1 ∪ Σ2 )). Similarly, the k th register of LFMA p0 is look-ahead reassigned at the j move (its content is different from all other registers, i.e., from Σ1 ) and not updated until last move. In addition all other registers reassigned after j move (their content is denoted by Σ2 ) must be different from the k th register. Therefore, L(p0 ) = L(p)(Σ \ (Σ1 ∪ Σ2 )). Since, by the induction hypothesis, I(wp ) = L(p), the desired equality I(wp0 ) = L(p0 ) follows. Now assume that j < i. By the definition of an instance of a word defined by an LQRE, the symbol assigned to xk in the ith position within the scope Si = (j, n + 2) must be assigned to it again in the (n + 1)th position, because scopes Si and Sn+1 overlap. That is, w1 w2 · · · wn wn+1 ∈ I(wp0 ) if and only if w1 w2 · · · wn ∈ I(wp ) and wn+1 = wi . Similarly, the k th register of LFMA p0 is used in the ith transition and is not look-ahead reassigned until the end of the computation. Thus, a word is accepted by LFMA p0 if and only if it is accepted by p and the symbol that appears in the ith position of the word also appears in its (n + 1)th position. Since, by the induction hypothesis, I(wp ) = L(p), equality I(wp0 ) = L(p0 ) follows in the latter case as well. This completes the proof of the induction hypothesis and the theorem. 2
42
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
We conclude this section with an alternative proof of Theorem 4 that is an immediate corollary to Theorem 5. Corollary 4 LQR languages are closed under reversing. Proof: It can be easily verified that, for an LQRE α, (L(α))R = L(αR ), where αR is defined by the following induction. • If α ∈ {∅, ε} ∪ Θ ∪ X, then αR = α. • (α1 + α2 )R is (α1R + α2R ). • (α1 · α2 )R is (α2R · α1R ). • (α1 ·x α2 )R is (α2R ·x α1R ). • ((α1 )∗ )R is (α1R )∗ . • ((α1 )∗x )R is (α1R )∗x . 2
3.4
Languages recursively defined by LQRE
In this section, to an LQRE α over (X, Θ) we recursively associate yet another ∗ language R(α) ⊆ Σ , see the following definition. Definition 12 Let A denote the set of all one-to-one functions a : X → Σ. For an LQRE α we recursively define the language R(α) generated by α and a function Pα : R(α) → 2A×A as described below. The intuitive meaning of Pα is as follows.14 For each w ∈ R(α), Pα (w) is the set of pairs of functions hF, Li ∈ A × A, where for all x ∈ X, F(x) and L(x) are the assignments to the first and last appearance of variable x in α that generates w, respectively. • If α = ∅, then R(α) = ∅ and Pα = ∅. • If α = ε, then R(α) = {ε} and Pα (ε) = {hF, Li | F = L}. • If α = θ ∈ Θ, then R(α) = {θ} and Pα (θ) = {hF, Li | F = L}. 14
See also Lemma 2 below for the intuition lying behind the above definition.
43
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
• If α = x ∈ X, then R(α) = Σ \ Θ and, for σ ∈ R(α), Pα (σ) = {hF, Li | F = L, F(x) = L(x) = σ}. • If α = α1 + α2 , then R(α) = R(α1 ) ∪ R(α2 ) and Pα (w) = Pα1 (w) ∪ Pα2 (w).15 • If α = α1 · α2 , then ¯ ½ ¾ ¯ w1 ∈ R(α1 ), w2 ∈ R(α2 ), ∃X ∈ A R(α) = w1 · w2 ¯¯ hF, X i ∈ Pα1 (w1 ), hX , Li ∈ Pα2 (w2 ) and ½ Pα (w1 · w2 ) =
¯ ¾ ¯ ∃X ∈ A, ¯ hF, Li ¯ . hF, X i ∈ Pα1 (w1 ), hX , Li ∈ Pα2 (w2 )
• If α = α1 ·x α2 , then ¯ ¯ w1 ∈ R(α1 ), w2 ∈ R(α2 ), ¯ R(α) = w1 · w2 ¯¯ ∃ hF , L1 i ∈ Pα1 (w1 ), ∃ hF2 , Li ∈ Pα2 (w2 ), ¯ ∀y ∈ X \ {x} : L1 (y) = F2 (y) and ½ Pα (w1 ·w2 ) =
¯ ¾ ¯ ∃ hF , L1 i ∈ Pα1 (w1 ), ∃ hF2 , Li ∈ Pα2 (w2 ), ¯ hF, Li ¯ . ∀y ∈ X \ {x} : L1 (y) = F2 (y)
• If α = (α1 )∗ , then R(α) =
∞ [
R(α1k ),
k=0
where
α1k
is defined as a “k times concatenation” α1 · α1 · · · α1 and | {z } ×k
Pα (w) =
∞ [
Pαk1 (w).
k=0 15
For i = 1, 2, if Pαi (w) is undefined, then we put Pαi (w) = ∅.
44
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
• If α = (α1 )∗x , then R(α) =
∞ [
R(α1·x k ),
k=0
where α1·x k is defined as a “k times ·x concatenation” α1 ·x α1 ·x · · · ·x α1 | {z } ×k
and Pα (w) =
∞ [
Pα·x k (w).
k=0
1
Example 9 Consider the following LQRE α over (X = {x, y}, Θ = {a}), α = (|{z} ε ·x |{z} x ) · ((|{z} a ·y y ) ·y y ) . |{z} |{z} α0 α1 α3 α6 α {z } | | {z 4 } α2 α5 {z } | α7
Then, dacb ∈ R(α), because hF = (x : b, y : b), L = (x : b, y hF = (x : d, y : b), L = (x : d, y hF = (x : b, y : b), L = (x : d, y hF = (x : d, y : b), L = (x : d, y hF = (x : d, y : c), L = (x : d, y hF = (x : d, y : b), L = (x : d, y hF = (x : d, y : b), L = (x : d, y hF = (x : d, y : b), L = (x : d, y hF = (x : b, y : c), L = (x : d, y
3.5
: b)i : b)i : b)i : b)i : c)i : c)i : b)i : b)i : b)i
∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈
Pα0 (ε), Pα1 (d), Pα2 (d), Pα3 (a), Pα4 (c), Pα5 (ac), Pα6 (b), Pα7 (acb), Pα (dacb).
Equivalence of the definitions
In this section we show that R(α) = I(α), for all LQRE α. The proof is based on the following lemma. 45
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
∗
Lemma 2 For any LQRE α over (X, Θ), for any w ∈ Σ , and for any two functions F, L ∈ A, w ∈ R(α) and hF, Li ∈ Pα (w) if and only if for some v ∈ LΣb (b α), w ∈ I(v) and the following conditions are satisfied If σ is the assignment to the first appearance of x in v, then F(x) = σ. (1) If σ is the assignment to the last appearance of x in v, then L(x) = σ. (2) If variable x is not appearing in v, then F(x) = L(x). (3) Proof: The proof is by induction on the complexity of α. α) = ∅. • If α = ∅, then α b = ∅. By definition, R(α) = ∅ and LΣb (b α) = {ε}. • if α = ε, then α b = ε. By definition, R(α) = {ε} and LΣb (b ∗ Then, for any w ∈ Σ and any F, L ∈ A w ∈ R(α) and hF, Li ∈ Pα (w) if and only if w = ε and F = L if and only if v = ε ∈ LΣb (b α), w = ε ∈ I(v) and (since no X’s variables appear in v) for all x ∈ X : F(x) = L(x). • If α = θ ∈ Θ, then α b = θ. By definition, R(α) = {θ} and LΣb (b α) = {θ}, ∗ as well. Then, for any w ∈ Σ and any F, L ∈ A w ∈ R(α) and hF, Li ∈ Pα (w) if and only if w = θ and F = L if and only if v = θ ∈ LΣb (b α), w = θ ∈ I(v) and (since no X’s variables appear in v) for all x ∈ X : F(x) = L(x). • If α = x ∈ X, then α b = x. By definition, R(α) = Σ \ Θ and LΣb (b α) = ∗ {x}. Then, for any w ∈ Σ and any F, L ∈ A w ∈ R(α) and hF, Li ∈ Pα (w) if and only if w = σ ∈ Σ \ Θ, F = L and F(x) = L(x) = σ if and only if 46
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
α) : w = σ ∈ I(v) and since only x variable appears in v, v = x ∈ LΣb (b σ is the assignment to the first and last appearance of x in v, F(x) = L(x) = σ and for all y ∈ X \ {x}, F(y) = L(y). • If α = α1 + α2 , then α b=α b1 + α b2 . By definition, R(α) = R(α1 ) ∪ R(α2 ) ∗ α2 ). Then, for any w ∈ Σ and any F, L ∈ A α1 )∪LΣb (b α) = LΣb (b and LΣb (b w ∈ R(α) and hF, Li ∈ Pα (w) if and only if for some i ∈ {1, 2}, w ∈ R(αi ) and hF, Li ∈ Pαi (w) if and only if (by the induction hypothesis) for some i ∈ {1, 2}, some v i ∈ LΣb (b αi ), w ∈ I(v i ) and the conditions (1) – (3) are satisfied if and only if for some v ∈ LΣb (b α) : w ∈ I(v) and the conditions (1) – (3) are satisfied. • If α = α1 · α2 , then α b=α b1 · α b2 . By definition, ¯ ½ ¾ ¯ w1 ∈ R(α1 ), w2 ∈ R(α2 ), ∃X ∈ A ¯ R(α) = w1 · w2 ¯ hF, X i ∈ Pα1 (w1 ), hX , Li ∈ Pα2 (w2 ) and α1 · α b2 ). α) = LΣb (b LΣb (b For the inclusion R(α) ⊆ I(α), we shall prove that, for any word w = w1 · w2 ∈ R(α), where w1 ∈ R(α1 ), w2 ∈ R(α2 ), and any hF, Li ∈ α) such that w ∈ I(v). That is, w Pα (w), there exists a word v ∈ LΣb (b is an instance of v, and F, L satisfy conditions (1) – (3) of the lemma. Since, by the induction hypothesis, α1 and α2 satisfy the lemma, there α2 ) such that w1 ∈ I(v 1 ) and w2 ∈ exist v 1 ∈ LΣb (b α1 ) and v 2 ∈ LΣb (b I(v 2 ). We denote by n1 the length of v 1 and by n2 the length of v 2 . Let v = v 1 · v 2 ∈ LΣb (b α) and assume to the contrary that w 6∈ I(v). That is, there exist variables x, y ∈ X with overlapping scopes Six = (j, k) and Siy0 = (j 0 , k 0 ), respectively, which do not satisfy the condition of Definition 9 for I(v), see Figure 3.2 below. Without loss of generality, we may assume that i ≤ n1 and i0 > n1 , and we shall distinguish between the case of x = y and vi 6= vi0 , and the case of x 6= y and vi = vi0 . 47
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
Siy0 =(j 0 ,k0 )
}|
z x
v=
y
·
i
v1 |
i0
{z
{ v2
}
Six =(j,k)
Figure 3.2: Scopes Six and Siy0 in v = v 1 · v 2 Assume first x = y, and vi 6= vi0 . Then j = j 0 , k = k 0 , i.e., the scopes Six and Siy0 coincide. Since ·x does not appear in vi · · · vi0 and, by the induction hypothesis, L1 (x) = vi and F2 (x) = vi0 , implying L1 (x) 6= F2 (x). Therefore, w = w1 · w2 6∈ R(α), which contradicts our assumption. Now assume x 6= y and vi = vi0 . If k > n1 and j 0 ≤ n1 , then since ·x does not appear in vi · · · vn1 and, by the induction hypothesis, L1 (x) = vi . Similarly, F2 (y) = vi0 . Then, since w = w1 · w2 ∈ R(α), L1 = F2 , implying L1 (x) = L1 (y), which contradicts L1 ∈ A. The proof of the case in which k ≤ n1 or j 0 > n1 is similar to the above and is omitted. For the inclusion R(α) ⊇ I(α), we shall prove that, for any word v ∈ α), and any w ∈ I(v) with any appropriate functions F, L ∈ A,16 LΣb (b w ∈ R(α) and hF, Li ∈ Pα (w). Since, by Definition 9 for I(v), I(v) ⊂ I(v 1 ) · I(v 2 ), there exist w1 ∈ I(v 1 ) and w2 ∈ I(v 2 ) such that w = w1 · w2 . Note that, if F, L ∈ A are appropriate functions of w, then F and L are appropriate functions of w1 and w2 , respectively. By the induction hypothesis, α1 and α2 satisfy the lemma. Thus, w1 ∈ I(v 1 ) with any appropriate functions F1 and L1 if and only if w1 ∈ R(α1 ) and hF1 , L1 i ∈ Pα1 (w1 ). Similarly, w2 ∈ I(v 2 ) with any appropriate functions F2 and L2 if and only if w2 ∈ R(α2 ) and hF2 , L2 i ∈ Pα2 (w2 ). Assume to the contrary that the condition of Definition 12 for R(α1 ·α2 ) is not satisfied. That is, for any appropriate functions L1 and F2 there exists x ∈ X, such that, L1 (x) 6= F2 (x). Without loss of generality we may assume that, x appears in v 1 in place i and in v 2 in place i0 . Then, since ·x does not appear in vi · · · vn1 +i0 , the scopes Six and Snx1 +i0 coincide. However, vi 6= vn1 +i0 , which contradicts w ∈ I(v). 16
F, L ∈ A are appropriate functions if they satisfy conditions (1) – (3) of the lemma.
48
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
• If α = α1 ·z α2 , then α b=α b1 · ·z · α b2 . By definition, ¯ ¯ w1 ∈ R(α1 ), w2 ∈ R(α2 ), ¯ ¯ R(α) = w1 · w2 ¯ ∃ hF , L1 i ∈ Pα1 (w1 ), ∃ hF2 , Li ∈ Pα2 (w2 ), ¯ ∀x ∈ X \ {z} : L1 (x) = F2 (x) and LΣb (b α) = LΣb (b α1 · ·z · α b2 ). For the inclusion R(α) ⊆ I(α), we shall prove that, for any word w = w1 · w2 ∈ R(α), where w1 ∈ R(α1 ), w2 ∈ R(α2 ), and any hF, Li ∈ α) such that w ∈ I(v). That is, w Pα (w), there exists a word v ∈ LΣb (b is an instance of v, and F, L satisfy conditions (1) – (3) of the lemma. Since, by the induction hypothesis, α1 and α2 satisfy the lemma, there exist v 1 ∈ LΣb (b α1 ) and v 2 ∈ LΣb (b α2 ) such that w1 ∈ I(v 1 ) and w2 ∈ I(v 2 ). We denote by n1 the length of v 1 and by n2 the length of v 2 . Let v = v 1 · ·z · v 2 ∈ LΣb (b α) and assume to the contrary that w 6∈ I(v). That is, there exist variables x, y ∈ X with overlapping scopes Six = (j, k) and Siy0 = (j 0 , k 0 ), respectively, which do not satisfy the condition of Definition 9 for I(v), see Figure 3.3 below. Without loss of generality, we may assume that i ≤ n1 and i0 > n1 , and we shall distinguish between the case of x = y and vi 6= vi0 , and the case of x 6= y and vi = vi0 . Siy0 =(j 0 ,k0 )
}|
z x
v=
· ·z ·
i
v1 |
{
y
i0
{z
v2
}
Six =(j,k)
Figure 3.3: Scopes Six and Siy0 in v = v 1 · ·z · v 2 Assume first x = y, and vi 6= vi0 . Then since the scopes Six and Siy0 overlap x, y 6= z, j = j 0 , k = k 0 . Therefore, the scopes coincide. Since ·x does not appear in vi · · · vi0 and, by the induction hypothesis, L1 (x) = vi and F2 (x) = vi0 , implying L1 (x) 6= F2 (x). Therefore, w = w1 · w2 6∈ R(α), which contradicts our assumption. Now assume x 6= y and vi = vi0 . Also assume k > n1 and j 0 ≤ n1 . If y 6= z, then since ·x does not appear in vi · · · vn1 and, by the 49
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
induction hypothesis, L1 (x) = vi . Similarly, F2 (y) = vi0 . Then, since w = w1 · w2 ∈ R(α), for any v ∈ X \ {z}, L1 (v) = F2 (v), implying L1 (x) = L1 (y), which contradicts L1 ∈ A. Similarly, if y = z, then F2 is not a one-to-one function, because F2 (x) = F2 (y), which contradicts F2 ∈ A. The proof of the case in which k ≤ n1 or j 0 > n1 is similar to the above and is omitted. For the inclusion R(α) ⊇ I(α), we shall prove that, for any word v ∈ α), and any w ∈ I(v) with any appropriate functions F, L ∈ A, LΣb (b w ∈ R(α) and hF, Li ∈ Pα (w). Since, by Definition 9 for I(v), I(v) ⊂ I(v 1 ) · I(v 2 ), there exist w1 ∈ I(v 1 ) and w2 ∈ I(v 2 ) such that w = w1 · w2 . Note that, if F, L ∈ A are appropriate functions of w, then F and L are appropriate functions of w1 and w2 , respectively. By the induction hypothesis, α1 and α2 satisfy the lemma. Thus, w1 ∈ I(v 1 ) with any appropriate functions F1 and L1 if and only if w1 ∈ R(α1 ) and hF1 , L1 i ∈ Pα1 (w1 ). Similarly, w2 ∈ I(v 2 ) with any appropriate functions F2 and L2 if and only if w2 ∈ R(α2 ) and hF2 , L2 i ∈ Pα2 (w2 ). Assume to the contrary that the condition of Definition 12 for R(α1 ·z α2 ) is not satisfied. That is, for any appropriate functions L1 and F2 there exists x ∈ X, x 6= z, such that, L1 (x) 6= F2 (x). Without loss of generality we may assume that, x appears in v 1 in place i and in v 2 in place i0 . Then, since ·x does not appear in vi · · · vn1 +1+i0 , the scopes Six and Snx1 +1+i0 coincide. However, vi 6= vn1 +1+i0 , which contradicts w ∈ I(v). b = (b α1 )∗ . By definition, • If α = (α1 )∗ , then α R(α) =
∞ [
R(α1k )
k=0
and LΣb (b α) = LΣb ((b α1 )∗ ). Since, LΣb ((b α1 )∗ ) =
∞ [
LΣb ((b α1 )k )
k=0
and, by the induction hypothesis, for any k = 0, 1, . . ., α1k satisfies the lemma, α satisfies the lemma, too. 50
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
• If α = (α1 )∗x , then α b = (b α1 · ·x )∗ . By definition, R(α) =
∞ [
R(α1·x k )
k=0
and α1 · ·x )∗ ). α) = LΣb ((b LΣb (b Since, ∗
LΣb ((b α1 · ·x ) ) =
∞ [
LΣb ((b α1 · ·x )k )
k=0
and, by the induction hypothesis, for any k = 0, 1, . . ., α1·x k satisfies the lemma, α satisfies the lemma, too. This completes the proof of the induction hypothesis and the lemma. 2 Theorem 7 For any LQRE α over (X, Θ), R(α) = I(α), i.e., R(α) and I(α) define the same language. Proof: The result immediately follows from Lemma 2.
51
2
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
4
Regular grammars for LQR languages (QRG)
In this section we introduce an alternative description of LQR languages by the so called quasi-regular grammars (QRG), which are the infinite alphabet counterpart of the ordinary regular grammars, see [4, Definition 9.1].
4.1
The definition of QRG
Definition 13 A context-free grammar G = (V, u, R, S) over an infinite alphabet is called right-linear grammar over an infinite alphabet if R ⊆ V × {1, 2, . . . , r} × ({ε} ∪ V ∪ ({1, 2, . . . , r} · V )). That is, all productions of G are of the form (A, i) → ε, (A, i) → B or (A, i) → k · B. Symmetrically, if all productions of G are of the form (A, i) → ε, (A, i) → B or (A, i) → B · k, we call G left-linear grammar over infinite alphabet. Definition 14 A right- or left-linear grammar over infinite alphabet is called a quasi-regular grammar, or shortly QRG. The set of all languages generated by quasi-regular grammars is denoted by LQRG . Example 10 Let G = ({S, T, U, V, W }, u, R, S) be a right-linear grammar 3 over infinite alphabet, where u = ### ∈ (Σ ∪ {#}) (that is, u is a word of length three), and R consists of (S, 1) (T, 2) (U, 1) (V, 3) (W, 3)
→ → → → →
1S | T, 2U, 1U | V, 2W, and 1W | 2W | 3W | ε.
(1) (2) (3) (4) (5)
One can easily verify that L(G) = L(A) from Example 4. That is, L(G) consists exactly of those words where some element of Σ appears twice or more. For example, the word σ1 σ2 σ3 σ4 σ2 σ3 , for any σ1 , σ2 , σ3 , σ4 ∈ Σ, is
52
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
derived as follows. (S, u) ⇒ 1 ⇒ 3 ⇒ 3 ⇒ 5
σ1 (S, σ1 ##) ⇒ σ1 (T, σ1 ##) ⇒ σ1 σ2 (U, σ1 σ2 #) 1 2 σ1 σ2 σ3 (U, σ3 σ2 #) ⇒ σ1 σ2 σ3 σ4 (U, σ4 σ2 #) 3 σ1 σ2 σ3 σ4 σ2 (W, σ4 σ2 #) σ1 σ2 σ3 σ4 (V, σ4 σ2 #) ⇒ 4 σ1 σ2 σ3 σ4 σ2 σ3 (W, σ4 σ2 σ3 ) ⇒ σ1 σ2 σ3 σ4 σ2 σ3 . 5
Example 11 In this example we show that ordinary regular grammars are quasi-regular grammars. Let Σ0 = {u1 , u2 , . . . , ur } be an r−element subset of Σ and let G0 = (V, Σ0 , R0 , S) be a regular grammar over Σ0 . Consider a QRG G = (V, u1 u2 . . . ur σ, R, S), where σ 6∈ Σ0 and R consists of all productions of the form (A, r + 1) → α1 α2 · · · αn for which there exists A → X1 X2 · · · Xn ∈ R0 such that ai = Xi , if Xi ∈ V , and ai = k, if Xi = uk ∈ Σ0 , i = 1, 2, . . . , n. It immediately follows from the definition of G that L(G0 ) = L(G).
4.2
Equivalence of the languages defined by QRG and LFMA
Theorem 8 A language is defined by a QRG if and only if it is accepted by an LFMA. In other words LLQR = LQRG . Proof: First, let G = (V, u, R, S) be a right-linear grammar over Σ. Consider an r-register LFMA, A = (Q, s0 , u, µ, ρ, f0 ) over Σ, such that • Q = Q1 ∪ Q2 ∪ Q3 , where − Q1 = {[ε]}, − Q2 = {[A] : A ∈ V }, and − Q3 = {[Ai ] : (A, i) ∈ V × {1, 2, . . . , r}}. • s0 = [S]. • µ = µ1 ∪ µ2 ∪ µ3 ∪ µ4 , where − µ1 = {([A], ε, [Ai ]) : (A, i) → a ∈ R}, − µ2 = {([Ai ], ε, [ε]) : (A, i) → ε ∈ R}, − µ3 = {([Ai ], ε, [B]) : (A, i) → B ∈ R}, and − µ4 = {([Ai ], k, [B]) : (A, i) → k · B ∈ R}. 53
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
• ρ([A], [Ai ]) = i if and only if for some a, (A, i) → a ∈ R. • f0 = [ε]. In other words, we use a translation scheme from right-linear grammar G to automaton A as described by Figure 4.1 below. (A, i) → ε
⇒
[A]
(A, i) → B
⇒
[A]
(A, i) → k · B
⇒
[A]
ε:i
ε:i
ε:i
ε
[Ai ]
[ε]
ε
[Ai ]
[B]
k
[Ai ]
[B]
Figure 4.1: The G to A translation scheme We intend to prove that L(G) = L(A). The proof is by induction on the length n of a derivation by G (respectively, an accepting run of A). That r6= is, we shall show that for any A ∈ V , w = w1 w2 · · · wr , wn ∈ Σ and ∗ σ = σ1 σ2 · · · σl ∈ Σ n
(A, w) ⇒ σ if and only if ([A], w, σ) `
2n
(f0 , wn , ε).
(1)
For the basis, let n = 1. Then, (A, w) ⇒ ε if and only if for some i ∈ {1, 2, . . . , r} there exists a production (A, i) → ε ∈ R, if and only if A can perform the following two steps on ε from state [A] ([A], w, ε) `ε:i ([Ai ], w1 , ε) `ε (f0 , w1 , ε), where w1 is obtained from w by replacing wi with σ 6∈ [w] \ {wi }. For the induction step, we assume that (1) is true for n, and shall show that it holds for n + 1. We shall distinguish between the following cases of the first step of the derivation from variable A, respectively, of the first two steps of the automaton computation from state [A]. 2
• (A, w) ⇒ (B, w1 ), respectively, ([A], w, σ) ` ([B], w1 , σ) and 2
• (A, w) ⇒ σ1 (B, w1 ), respectively, ([A], w, σ) ` ([B], w1 , σ2 σ3 · · · σl ), 54
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
r6=
for some variable B ∈ V and assignment w1 ∈ Σ . For the first case, (A, w) ⇒ (B, w1 ) if and only if for some i ∈ {1, 2, . . . , r} there exists a production (A, i) → B ∈ R such that w1 is obtained from w by replacing wi with σ 6∈ [w] \ {wi }, if and only if A can perform the following two steps on σ from state [A] ([A], w, σ) `ε:i ([Ai ], w1 , σ) `ε ([B], w1 , σ), where w1 is obtained from w by replacing wi with σ 6∈ [w] \ {wi }. Since, by the induction hypothesis, (1) is true for n (for B, w1 , and σ), the induction hypothesis holds for n + 1. For the second case, (A, w) ⇒ σ1 (B, w1 ) if and only if for some i ∈ {1, 2, . . . , r} there exists a production (A, i) → k · B ∈ R such that w1 is obtained from w by replacing wi with σ 6∈ [w] \ {wi } and σ1 = w1,k , if and only if A can perform the following two steps on σ from state [A] ([A], w, σ) `ε:i ([Ai ], w1 , σ) `k ([B], w1 , σ2 σ3 · · · σl ), where w1 is obtained from w by replacing wi with σ 6∈ [w] \ {wi } and σ1 = w1,k . Since, by the induction hypothesis, (1) is true for n (for B, w1 , and σ2 σ3 · · · σl ), the induction hypothesis holds for n + 1. ∗ Thus, for any n ∈ N+ and σ ∈ Σ n
(S, u) ⇒ σ if and only if (s0 , u, σ) `
2n
(f0 , wn , ε),
which proves the desired equality L(G) = L(A). Now, let G = (V, u, R, S) be a left-linear grammar over Σ. Let G0 = (V, u, R0 , S), where R0 consists of the productions of G with right hand side reversed, i.e., R0 = {(A, i) → a : (A, i) → aR ∈ R}. That is, if we reverse the productions of a left-linear grammar we obtain a right-linear grammar, and vice versa. Thus, G0 is a right-linear grammar, and it is easy to see that L(G0 ) = (L(G))R . By the preceding proof, L(G0 ) is a look-ahead quasi-regular language. Since by Theorem 4, look-ahead quasiregular languages are closed under reversing, (L(G0 ))R = L(G) is also a look-ahead quasi-regular language. Thus, every right- or left-linear grammar over infinite alphabet generates a look-ahead quasi-regular language. Conversely, let A = (Q, s0 , u, µ, ρ, f0 ) be an r-register LFMA over Σ. Without loss of generality we may assume that, Q consists of the states [ε], [A], [B], . . . , [S], . . ., s0 = [S] and f0 = [ε]. Consider a right-linear grammar G = (V, u0 , R, S) over Σ, such that 55
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
• V = {A : [A] ∈ Q}. • u0 = u#. • R = R1 ∪ R2 ∪ R3 , where − R1 = {(A, r + 1) → B : ([A], ε, [B]) ∈ µ and ρ([A], [B]) = nil}, − R2 = {(A, i) → B : ([A], ε, [B]) ∈ µ and ρ([A], [B]) = i}, and − R3 = {(A, r + 1) → k · B : ([A], k, [B]) ∈ µ and k ∈ {1, 2, . . . , r}}. • S is as in state s0 . In other words, we use a translation scheme from automaton A to right-linear grammar G as described by Figure 4.2 below. ε
[A]
ε:i
[A]
k
[A]
[B]
⇒ (A, r + 1) → B
[B]
⇒ (A, i) → B
[B]
⇒ (A, r + 1) → k · B
Figure 4.2: The A to G translation scheme We intend to prove that L(A) = L(G). The proof is by induction on the length n of an accepting run of A (respectively, a derivation by G). r6= That is, we shall show that for any [A] ∈ Q, w = w1 w2 · · · wr , wn ∈ Σ , ∗ σ = σ1 σ2 · · · σl ∈ Σ , and some τ ∈ (Σ ∪ {#} \ [w]) n
n
([A], w, σ) ` (f0 , wn , ε) if and only if (A, wτ ) ⇒ σ.
(2)
For the basis, let n = 1. Then, ([A], w, σ) ` (f0 , w1 , ε) if and only if one of the following holds. • ([A], ε, f0 ) ∈ µ, ρ([A], f0 ) = nil, and σ = ε. • ([A], ε, f0 ) ∈ µ, ρ([A], f0 ) = i, and σ = ε. • ([A], k, f0 ) ∈ µ, for some k ∈ {1, 2, . . . , r}, and σ = wk .
56
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
By the the translation scheme, the above holds if and only if one of the following holds. • (A, r + 1) → ε and σ = ε. • (A, i) → ε and σ = ε. • (A, r + 1) → k · ε, for some k ∈ {1, 2, . . . , r} and σ = wk . For the induction step, we assume that (2) is true for n, and shall show that it holds for n + 1. We shall distinguish between the following cases of the first step of the automaton computation from state [A], respectively, of the first step of the derivation from variable A. • ([A], w, σ) ` ([B], w1 , σ), respectively (A, wτ ) ⇒ (B, w1 τ1 ) and • ([A], w, σ) ` ([B], w, σ2 σ3 · · · σl ), respectively, (A, wτ ) ⇒ σ1 (B, wτ1 ), r6=
for some state [B] ∈ Q, assignment w1 ∈ Σ , τ ∈ (Σ ∪ {#} \ [w]), and τ1 ∈ (Σ ∪ {#} \ [w1 ]). For the first case, ([A], w, σ) ` ([B], w1 , σ) if and only if there exists a transition ([A], ε, [B]) ∈ µ, ρ([A], [B]) = nil (then, w1 = w) or for some i ∈ {1, 2, . . . , r}, ρ([A], [B]) = i (then, w1 is obtained from w by replacing wi with σ 6∈ [w] \ {wi }) if and only if there exists a production (A, r + 1) → B ∈ R (then, w1 = w and τ1 ∈ (Σ ∪ {#} \ [w])) or for some i ∈ {1, 2, . . . , r}, there exists a production (A, i) → B ∈ R (then, w1 is obtained from w by replacing wi with σ 6∈ [w] \ {wi } and τ1 = τ ) if and only if (A, wτ ) ⇒ (B, w1 τ1 ). Since, by the induction hypothesis, (2) is true for n (for [B], w1 , τ1 , and σ), the induction hypothesis holds for n + 1. For the second case, 57
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
([A], w, σ) ` ([B], w, σ2 σ3 · · · σl ) if and only if for some k ∈ {1, 2, . . . , r}, there exists a transition ([A], k, [B]) ∈ µ and σ1 = wk if and only if there exists a production (A, r + 1) → k · B ∈ R, τ1 ∈ (Σ ∪ {#} \ [w]) and σ1 = wk if and only if (A, wτ ) ⇒ σ1 (B, wτ1 ). Since, by the induction hypothesis, (2) is true for n (for [B], w, τ1 , and σ2 σ3 · · · σl ), the induction hypothesis holds for n + 1. ∗ Thus, for any n ∈ N+ and σ ∈ Σ n
n
(s0 , u, σ) ` (f0 , wn , ε) if and only if (S, u) ⇒ σ, which proves the desired equality L(A) = L(G). To produce a left-linear grammar for L(A), we start with an LFMA for (L(A))R and then reverse the right hand sides of all productions of the resulting right-linear grammar. This completes the proof of the theorem. 2
58
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
5
Summary
In our thesis, we presented an extension of FMA, called look-ahead finitememory automaton (LFMA). We have shown that the look-ahead quasiregular (LQR) languages posses many of the closure properties of ordinary regular languages, including closure under reversing. Also we have proved that FMA can be simulated by LFMA. In addition we introduced regular expressions (LQRE) and regular grammars (QRG) over infinite alphabets. We summarize the closure properties of various finite state machines over infinite alphabets in the following table. Finite State Machine 1D-RA 2D-RA 1N-RA (FMA) 1N-LA (LFMA)
L1 ∪ L2 Yes ? Yes Yes
L1 ∩ L2 Yes ? Yes Yes
L1 · L2 No ? Yes Yes
L Yes ? No No
L∗ No ? Yes Yes
LR No ? No Yes
Table 1: The closure properties of various finite state machines S T Where is union, is intersection, · is concatenation, L is complement, ∗ is the Kleene star, and R is reversing. We conclude this work with two open problems which, on the one hand, are of interest in their own right, and on the other hand, might give a better insight into simple languages over infinite alphabets. • Does each look-ahead quasi-regular language belong to the closure of quasi-regular languages under union, intersection, concatenation, the Kleene star, and reversing? • Can an LFMA be simulated by two-way finite-memory automata?
59
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
References [1] M. Bielecki, J. Hidders, J. Paredaens, J. Tyszkiewicz, and J. Van den Bussche. Navigating with a browser. In P. Widmayer, F. Triguero, R. Morales, M. Hennessy, S. Eidenbenz, and R. Conejo, editors, ICALP, volume 2380 of Lecture Notes in Computer Science, pages 764–775. Springer, 2002. [2] B. Bollig, M. Leucker, and T. Noll. Generalised regular MSC languages. In M. Nielsen and U. Engberg, editors, FoSSaCS, volume 2303 of Lecture Notes in Computer Science, pages 52–66. Springer, 2002. [3] E. Y. C. Cheng and M. Kaminski. Context-free languages over infinite alphabets. Acta Informatica, 35(3):245–267, 1998. [4] J.E. Hopcroft and J.D. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley, Inc., Reading, MA, 1979. [5] C.H. Papadimitriou H.R. Lewis. Elements of the theory of computation. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1981. [6] M. Kaminski and N. Francez. Finite-memory automata. In Proceedings of the 31th Annual IEEE Symposium on Foundations of Computer Science, pages 683–688, Los Alamitis, CA, 1990. IEEE Computer Society Press. [7] M. Kaminski and N. Francez. Finite-memory automata. Theoretical Computer Science, 134(2):329–363, 1994. [8] M. Kaminski and T. Tan. Regular expressions for languages over infinite alphabets. In K.-Y. Chwa and J. I. Munro, editors, COCOON, volume 3106 of Lecture Notes in Computer Science, pages 171–178. Springer, 2004. [9] M. Kaminski and T. Tan. Regular expressions for languages over infinite alphabets. Fundamenta Informaticae, 69:301–318, 2006. [10] F. Neven, T. Schwentick, and V. Vianu. Towards regular languages over infinite alphabets. In J. Sgall, A. Pultr, and P. Kolman, editors, MFCS, volume 2136 of Lecture Notes in Computer Science, pages 560– 572. Springer, 2001. 60
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
[11] F. Neven, T. Schwentick, and V. Vianu. Finite state machines for strings over infinite alphabets. ACM Transactions on Computational Logic, 5(3):403–435, 2004. [12] Y. Shemesh and N. Francez. Finite-state unification automata and relational languages. Information Computation, 114(2):192–213, 1994. [13] A. Tal. Decidability of inclusion for unification based automata. Master’s thesis, Department of Computer Science, Technion - Israel Institute of Technology, Haifa 32000, Israel, 1999.
61
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
minixhn iteq oexkif ilra mihnehe`
oilhiiv l`ipc
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
minixhn iteq oexkif ilra mihnehe`
xwgn lr xeaig
x`ezd zlawl zeyixcd ly iwlg ielin myl aygnd ircna mircnl xhqibn
oilhiiv l`ipc
l`xyil ibelepkh oekn - oeipkhd hpql ybed
2006
ilei
dtig
e"qyz fenz
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
.aygnd ircnl dhlewta iwqpinw l`kin 'g/'text ziigpda dyrp xwgnd
.izenlzyda daicpd zitqkd dkinzd lr oeipkhl dcen ip`
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
mipipr okez 1
xivwz
3
mixeviwe milnq
4 5 6 6 9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zeiqiqa zexcbd . . . . . . . . . . . . . . . . . . . . . . . . zencew zecear . . . . . . . . .
(2WFMA)
(FMA)
iteq oexkif lra hnehe`
ipeeik-ec iteq oexkif lra hnehe`
mizia-sl` lrn xywd ixqg miwecwic
10
1
1.1 1.2
1.2.1 1.2.2 1.2.3
. . . . . . . . . . . . . . . . . . . . . . miiteqpi`
(LFMA)
mixhn iteq oexkf lra hnehe`
12 12 12 13 15 20
. . . . . . . . . . . . .
36 36 36 38 43 45
. . . . . . . . . . .
52 52 53
. . . . . . . . . . .
59
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
`ean
. . . . . . . . . . . . . . . . . . . . dnydd mewna yegip . . . . . . . . . . . . . . . . . . . . . .
LFMA
ly dxcbdd
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
zeiqiqa zepekz
. . . . . . . . . . . . . . . . . . . . . . . .
(LQRE) LQR
ze`nbec
zexibq zepekz
LQRE ly dxcbdd LQRE ly mixwn i"r zexcbend zety LFMA e LQRE i"r zexcbend zetyd zeliwy . . . . . LQRE i"r ziaiqxewx zexcbend zety
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . zexcbdd zeliwy
(QRG) LQR
LFMA e QRG
QRG
ly dxcbdd
i"r zexcbend zetyd zeliwy
3
3.1 3.2 3.3 3.4 3.5
zety xear mixlebx miwecwc
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 2.2 2.3 2.4 2.5
zety xear miixlebx miiehia
. . . . . . . . . . . . . . . . . . . . . .
2
4
4.1 4.2
mekiq
5
ditxbeilaia
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
mixei` zniyx 8 L = {σσ : |σ| ≥ 1, σ ∈ Σ \ [σ]} dtyd z` lawnd FMA hnehe` 14 . . . L = {σσ : σ ∈ Σ \ [σ]} dtyd z` lawnd LFMA hnehe` 15 . L = {σ 1 σσ 2 σσ 3 : σ ∈ Σ} dtyd z` lawnd LFMA hnehe` 20 . . . . . . . . . . . . . . . . . . . LFMA-l FMA-n mebxz znikq dtyd z` lawnd LFMA hnehe` 33 . . . . . . . . . . . . . . . Ls = {uσv : |u|, |v| ≥ 1, σ 6∈ [u] ∪ [v]} ∗ 34 . . . . . . . . . . . . . . Ls dtyd z` lawnd 2WFMA hnehe` 35 . . . . . . . . . . . . . . . . . . . . . A ly Atx,y,z aikxd znxb`ic 37 . . . . . . . . . . . . . . . . . . . . . . . w-a zi-i ze`d ly megz 48 . . . . . . . . . . . . . . . . . . . . v = v 1 · v 2 -a Siy0 e Six minegz 49 . . . . . . . . . . . . . . . . . . v = v 1 · ·z · v 2 -a Siy0 e Six minegz 54 . . . . . . . . . . . . . . . . . . . . . . . . A-l G-n mebxz znikq 56 . . . . . . . . . . . . . . . . . . . . . . . . G-l A-n mebxz znikq
1.1 2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 3.3 4.1 4.2
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
xivwz finite-state datalog automata
mi`xwp xy` ,miiteq mihnehe` ly ycg lcen
ly hyten cenill crein did df lcen .1994 a fiqpxte yny i"r bved
(FSDA)
dxevdn ode qiqa iqgi ly dakxd i"r mixvein ef dtya milin .zeiqgi zety
n xear ri1 (xj1 , xk1 ) · · · rin (xjn , xkn ) FSDA ,miavn ly iteq xtqnl sqepa ,okl .miiteqpi` mizia-sl`a
yeniy yxec zeiqgi zety ly ite`d .mieqn mb miciievn
-eqpi` dveaw jezn dpzyn ly my xenyl milbeqnd ,mixbe` ly iteq xtqna ,miiteq mihnehe`a zrvazn xy` ,d`eeydd zwica .mipzyn zeny ly zit .zeiqgi zetya aeyg aikxn dpid xy` ,dcg`da dtlged ici-lr xzei xge`n bved ,miiteqpi` miza-sl`a wqerd ,sqep aeyig lcen ezxhne
(FMA) finite-memory automata
`xwp df lcen
.fiqpxte iwqpinw
-l dnec ote`a .miiteq mizia-sl` lrn zeixlebx zetyl zeliawnd zety iedif jezn oniq xenyl milbeqnd mixbe` ly ziteq dveawa miciievn xenyl mileki
FMA-a
mixbe`d oi` ,FSDA-n licadl
FMA ,FSDA
.iteqpi`d zia-sl`d
d`eeydl hnehe`d ly egek zlabd i"r .mixg` mixbe`a mi`vnp xy` mikxr aly lka lbeqn hnehe`d ,odylk zeivwpet lirtdl zlekid `ll ,cala dwzrde
”xekfl“ yi FMA
zelawznd zetyl ,okl .okl mcew e`xwpy mipniq ly iteq xtqn
wx
zgz zexebq opi` ody jkl hxt ,zeixlebx zety ly zepekz k"ca
i"r
.milyne jetid -sl` lrn zipqgn mr mihnehe`l
infinite-alphabet infinite-stack-
FMA
miiteqpi` mizia-sl` xyt`n `ede ly
z` eaigxd iwqpinwe bp'v
1998-a
`xwp mdly aeyigd lcen .miiteqpi` mizia
(IIPDA) alphabet pushdown automata
”zirah“ dllkd zedfl IIPDA-d zxhn .zipqgnd xear mbe hlwd xear IIPDA ,FMA-l dneca .miiteqpi` mizia-sl` lrn xywd-zexqg zety ote`a mileki IIPDA ,FMA-l cebipa la` ,mixbe` ly ziteq dveawa mici
-ievn `l
.mixbe`a mixenyd mipniq silgdl ihqipinxhc mizia-sl` lrn
pebbles
(PA) pebble automata
ebivd ep`ee wihpeey ,oaip dpexg`l
zxfra minewin ly iteq xtqn xenyl xyt`n lcend
.miiteqpi`
.el` minewina mipniq z`eeyd xyt`ne zipqgnd zhiyl miziivnd (mipeniq)
FMA oia zeeydl ozip `ly e`xd md oey`x-xcqn dwibel oia `vnp PA xy`k
,ipy-xcqn zicpene oey`x-xcqn zewibele .ipy-xcqn zicpen dwibele
miiteqpi` mizia-sl` lrn iteq oexkf ilra mihnehe` ly xwgnd dyrnl zpya fiqpxte iwqpinw ly milcend zrted mr mle` ,ixnbl izxe`izk ligzd zepzep xy` zeira `evnl epid ifkxnd oeirxd .xzei iyrn oeeb laiw xwgnd hpxhpi`d ocir zegztzd mr
1994
.eilrn zetye iteqpi` zia-sl`l iyrn yexit
.iteqpi` zia-sl`l yexit zepzep zeira xzeie xzei ,XML zty jkn d`vezke dtya dline ,hpxhpi` xz` ly
URL
zaezk likn
i
Σ
zia-sl` zeiradn zg`a
Technion - Computer Science Department - M.Sc. Thesis MSC-2006-24 - 2006
zevigl ly iteq xtqn ly d`vezk dfe hpxhpi`a heeip lelqnk zyxtzn
XML
L
ur ly micewcewa mikxr lr milkzqn zexg` zeiraa .otctca xakr .iteqpi` zia-sl` jezn mikxrk
look-ahead finite-memory automaton `xwpd lcen mibivn ep` ef dceara -`n j` ,FMA-l ce`n dnec lcend .FMA lcenl dagxd deedn xy` (LFMA) ik ,mi`xn ep` .edylk xbe` ly okezd ”yegip“ i"r dnixhn dnyd rval xyt xy`k) jetid zgz zexebq LFMA zetyye LFMA i"r divleniql ozip FMA iehia ly beq mixicbn ep` ok-enk .(ef dlert zgz zexebq opi` FMA zety ixlebx iehia i"r dxcbdl zpzip dty ik mi`xne ,iteqpi` zia-sl` lrn ixlebx midfn ep` ,seqal .LFMA i"r zlawzn `id m` wxe m` iteqpi` zia-sl` lrn zety mixfeby miiteqpi` mizia-sl` lrn xywd ixqg miwecwc ly dwlgn .LFMA ok-enke ef dcearl rwxd z` likn
1
wxt .mi`ad miwxtl zwlegn dceard
FMA ly dxcbd LFMA lcen mixicbn
.miiteqpi` mizia-sl` lrn mixg` aeyig ilcene
mi`ian ep`
eizepekz z` mixweg ,ef dcear ly `yepd -
ep`
2
wxta
-xibq :zellek xy` ,LFMA zety ly zexibqd zepekz z` migikene zeiqiqad mr wxtd z` miniiqn ep` .jetide ipilw xebq ,xeyxy ,cegi` ,jezig zgz ze
LFMA zty m`d ,dyw dl`yl epze` d`ian xy` ,`nbec miiteqpi` mizia-sl` lrn miixlebx miiehia mibivn ep` 3 wxta .ipeeik-ec FMA -sl` lrn ixlebx iehia i"r dxcbdl zpzip dty ik migikene LFMA zety xear -ep dxcbd mi`ian ep` ,sqepa .LFMA i"r zlawzn `id m` wxe m` iteqpi` zia
hnehe` i"r zlawzn
-xcbdd zeliwy migikene ixlebx iehia i"r zxcbend dty ly (ziaiqxewx) ztq mizia-sl` lrn xywd ixqg miwecwc ly dwlgn midfn ep`
4
wxta
.ze
ly xvw xivwz mr dceard z` miniiqn ep` .LFMA zety mixfeby miiteqpi` zxfra
LFMA
ly divleniql zexeywd zegezt zel`y izy mil`eye ze`vez .miiteqpi` mizia-sl` lrn mixg` aeyig ilcen
ii