Partial Derivative Automaton for Regular Expressions with Shuffle

Report 3 Downloads 106 Views
Partial Derivative Automaton for Regular Expressions with Shuffle Sabine Broda

Ant´ onio Machiavelo Rog´erio Reis

Nelma Moreira

CMUP & DM-DCC, Faculdade de Ciˆ encias da Universidade do Porto, Portugal

DCFS 2015 Waterloo, Ontario, Canada June 25–27, 2015

Shuffle Operation

x, y ∈ Σ? and a, b ∈ Σ x

ε=εx ax  by

= {x} = { az | z ∈ x

 by } ∪ { bz | z ∈ ax  y }.

L ⊆ Σ? L1

 L2 =

[

x

y

x∈L1 ,y ∈L2

If two languages L1 , L2 ⊆

Σ?

are regular then L1

 L2 is regular.



Regular Expressions with Shuffle (RE( ))

τ

→ ∅|α

α → ε | a | (α + α) | (α · α) | (α

L(∅) L(ε) L(a) L(α? )

= = = =

∅ {ε} {a} L(α)?

 α) | α?

(a ∈ Σ).

L(α + β) = L(α) ∪ L(β) L(αβ) = L(α)L(β) L(α β) = L(α) L(β).



( ε if ε ∈ L ε(L) = ∅ otherwise ε(τ ) = ε(L(τ ))



Example I

P3 = a1

 a2  a3

Example I

P3 = a1

 a2  a3

Equivalent to a1 a2 a3 + a1 a3 a2 + a2 a1 a3 + a2 a3 a1 + a3 a1 a2 + a3 a2 a1

Example I

Pn = a1

 · · ·  an

where n ≥ 1, ai 6= aj for 1 ≤ i 6= j ≤ n L(Pn ) = { ai1 · · · ain | i1 , . . . , in is a permutation of 1, . . . , n}.



Complexity of RE( )



Complexity of RE( )

I



Membership for RE( ) is NP-complete [MS94] (NL-complete for RE)



Complexity of RE( )

I

(NL-complete for RE) I



Membership for RE( ) is NP-complete [MS94]



Inequivalence for RE( ) is EXP-complete [MS94] (PSPACE-complete for RE)



Complexity of RE( )

I

(NL-complete for RE) I



Membership for RE( ) is NP-complete [MS94]



Inequivalence for RE( ) is EXP-complete [MS94] (PSPACE-complete for RE)

I



RE( ) =⇒ NFA exponential trade-off [MS94] (atmost quadratic for RE)



Complexity of RE( )

I

(NL-complete for RE) I



Membership for RE( ) is NP-complete [MS94]



Inequivalence for RE( ) is EXP-complete [MS94] (PSPACE-complete for RE)

I



RE( ) =⇒ NFA exponential trade-off [MS94] (atmost quadratic for RE)

I



RE( ) =⇒ RE double exponential trade-off [Gel10, GH09]



Complexity of RE( )

I

(NL-complete for RE) I



Membership for RE( ) is NP-complete [MS94]



Inequivalence for RE( ) is EXP-complete [MS94] (PSPACE-complete for RE)

I



RE( ) =⇒ NFA exponential trade-off [MS94] (atmost quadratic for RE)

 I RE() =⇒ DFA double exponential trade-off [Gel10] I

RE( ) =⇒ RE double exponential trade-off [Gel10, GH09] (exponential for RE)

Automata and System of Equations n-state NFA A = h[1, n], {a1 , . . . , ak }, S0 , δ, F i right language of state i Li = {w | δ(i, w ) ∩ F 6= ∅} L1 , . . . , Ln satisfy the system of equations Li

= a1 L1i ∪ · · · ∪ ak Lki ∪ ε(Li ),

i ∈ [1, n]

where each Lij is a (possibly empty) union of Lm , m ∈ [1, n] L(A) =

[ i∈S0

Li

NFA and System of Equations 1

a, b

a b

0 a

3 b

2

L0 = a(L1 ∪ L2 ) ∪ b(L1 ∪ L3 ) L1 = a L3 L2 = b L3 L3 = {ε} L0 = {b, aa, ab, ba}



Regular Expressions RE( ) and System of Equations



Given α0 ∈ RE( ), a support is set {α1 , . . . , αn } that satisfies a system of equations

αi

= a1 α1i + · · · + ak αki + ε(αi ),

i ∈ [0, n]

where αli , l ∈ [1, k], is a (possibly empty) sum of elements in {α1 , . . . , αn }.



Regular Expressions RE( ) and System of Equations



Given α0 ∈ RE( ), a support is set {α1 , . . . , αn } that satisfies a system of equations

αi

= a1 α1i + · · · + ak αki + ε(αi ),

i ∈ [0, n]

where αli , l ∈ [1, k], is a (possibly empty) sum of elements in {α1 , . . . , αn }. The existence of a support of α implies the existence of an NFA that accepts L(α).

Support π

Proposition 1



Given τ ∈ RE( ), the set π(τ ) is a support: π(∅) = π(ε) = ∅ π(α + β) = π(α) ∪ π(β) π(a) = {ε} π(αβ) = π(α)β ∪ π(β) ? ? π(α ) = π(α)α π(α β) = π(α) π(β) ∪ π(α) {β} ∪ {α}



 

 π(β)

Support π Proposition 1



Given τ ∈ RE( ), the set π(τ ) is a support: π(∅) = π(ε) = ∅ π(α + β) = π(α) ∪ π(β) π(a) = {ε} π(αβ) = π(α)β ∪ π(β) π(α? ) = π(α)α? π(α β) = π(α) π(β) ∪ π(α) {β} ∪ {α}





 



where, for S, T ⊆ RE( ) and β ∈ RE( ) \ {∅, ε}, Sβ = { αβ | α ∈ S } S T = { α β | α ∈ S, β ∈ T } Sε = {ε} S = S {ε} = S S∅ = ∅S = ∅.









 π(β)

Support π

Proposition 1



Given τ ∈ RE( ), the set π(τ ) is a support: π(∅) = π(ε) = ∅ π(α + β) = π(α) ∪ π(β) π(a) = {ε} π(αβ) = π(α)β ∪ π(β) π(α? ) = π(α)α? π(α β) = π(α) π(β) ∪ π(α) {β} ∪ {α}



 

 π(β)

Proposition 2 |π(τ )| ≤ 2|τ |Σ − 1, where |τ |Σ denotes the number of alphabet symbols in τ (= alphabetic length).

Example I P3 = a1

 a2  a3

Example I P3 = a1

 a2  a3 π(P3 ) = {ε, a1 , a2 , a3 , a1

a2 a1 : a2 : a3

a1 : a3 a1 a3

 a2, a1  a3, a2  a3} a1 a2

a2 : a3 a 2

a1 : a2

a1

a3 a3

a3 a1 a3 a2

a1 a2

ε

Example I P3 = a1

 a2  a3 π(P3 ) = {ε, a1 , a2 , a3 , a1

a2 a1 : a2 : a3

a1 : a3 a1 a3

 a2, a1  a3, a2  a3} a1 a2

a2 : a3 a 2

a1 : a2

Proposition 3 |π(Pn )| = 2n − 1 and |Pn |Σ = n

a1

a3 a3

a3 a1 a3 a2

a1 a2

ε

Partial Derivatives



τ ∈ RE( ), a ∈ Σ ∂a (∅) = ( ∂a (ε) = ∅ {ε} if b = a ∂a (b) = ∅ otherwise ∂a (α? ) = ∂a (α)α? ∂a (α + β) = ∂a (α) ∪ ∂a (β) ∂a (αβ) = ∂a (α)β ∪ ε(α)∂a (β) ∂a (α β) = ∂a (α) {β} ∪ {α} ∂a (β)







Partial Derivatives



τ ∈ RE( ), a ∈ Σ ∂a (∅) = ( ∂a (ε) = ∅ {ε} if b = a ∂a (b) = ∅ otherwise ∂a (α? ) = ∂a (α)α? ∂a (α + β) = ∂a (α) ∪ ∂a (β) ∂a (αβ) = ∂a (α)β ∪ ε(α)∂a (β) ∂a (α β) = ∂a (α) {β} ∪ {α} ∂a (β)







L(∂a (τ )) = {w | aw ∈ L(τ )} = a−1 L(τ )

Partial Derivatives





τ ∈ RE( ), a ∈ Σ, x ∈ Σ? , S ⊆ RE( ) ∂ε (τ ) = {τ } ∂xa (τ ) = ∂a (∂x (τ ))

Partial Derivatives





τ ∈ RE( ), a ∈ Σ, x ∈ Σ? , S ⊆ RE( ) ∂ε (τ ) = {τ } ∂xa (τ ) = ∂a (∂x (τ ))

∂(τ ) =

[

∂x (τ )

x∈Σ?

∂ + (τ ) =

[ x∈Σ+

∂x (τ )

Partial Derivatives





τ ∈ RE( ), a ∈ Σ, x ∈ Σ? , S ⊆ RE( ) ∂ε (τ ) = {τ } ∂xa (τ ) = ∂a (∂x (τ ))

∂(τ ) =

[

∂x (τ )

x∈Σ?

∂ + (τ ) =

[

∂x (τ )

x∈Σ+

Proposition 4



∀τ ∈ RE( ), ∂ + (τ ) = π(τ )

Partial Derivative Automaton



τ ∈ RE( ) Apd (τ ) = h∂(τ ), Σ, {τ }, δτ , Fτ i where

δτ (γ, a) = ∂a (γ) Fτ

= { γ ∈ ∂(τ ) | ε(γ) = ε }

Partial Derivative Automaton



τ ∈ RE( ) Apd (τ ) = h∂(τ ), Σ, {τ }, δτ , Fτ i where

δτ (γ, a) = ∂a (γ) Fτ

= { γ ∈ ∂(τ ) | ε(γ) = ε }

Partial Derivative Automaton



τ ∈ RE( ) Apd (τ ) = h∂(τ ), Σ, {τ }, δτ , Fτ i where

δτ (γ, a) = ∂a (γ) Fτ

= { γ ∈ ∂(τ ) | ε(γ) = ε }

Proposition 5 L(Apd (τ )) = L(τ )

Example I P 3 = a1 ∂a1 (P3 ) ∂a2 (P3 ) ∂a3 (P3 ) ∂a1 (a1 a3 ) ∂a3 (a1 a3 )

 

= = = = =

  

 a2  a3

{a2 a3 } {a1 a3 } {a1 a2 } {a3 } {a1 }

∂a1 (a1 ∂a2 (a1 ∂a2 (a2 ∂a3 (a2 ∂ai (ai )

 a2 )  a2 )  a3 )  a3 )

= = = = =

{a2 } {a1 } {a3 } {a3 } {ε}

Example I P 3 = a1 ∂a1 (P3 ) ∂a2 (P3 ) ∂a3 (P3 ) ∂a1 (a1 a3 ) ∂a3 (a1 a3 )

 

= = = = =

  

 a2  a3

{a2 a3 } {a1 a3 } {a1 a2 } {a3 } {a1 }

a2 a1 : a2 : a3

a1 : a3 a1 a3

∂a1 (a1 ∂a2 (a1 ∂a2 (a2 ∂a3 (a2 ∂ai (ai ) a1 a2

a2 : a3 a 2

a1 : a2

a1

 a2 )  a2 )  a3 )  a3 )

= = = = =

a3 a3

a3 a1 a3 a2

a1 a2

ε

{a2 } {a1 } {a3 } {a3 } {ε}

Average Case Complexity



RE( ) ⇒ NFA

Average Case Complexity

τ ⇒ Apd (τ )

RE ⇒ NFA

α → ε | a | α + α | α · α | α?

|α| = n |α|Σ = m

(a ∈ Σ)

RE ⇒ NFA

α → ε | a | α + α | α · α | α?

|α| = n |α|Σ = m I

Apos (α) - Position Automaton [Glu61]

(a ∈ Σ)

RE ⇒ NFA

α → ε | a | α + α | α · α | α?

(a ∈ Σ)

|α| = n |α|Σ = m I

Apos (α) - Position Automaton [Glu61]

I

|Apos (α)|Q = m + 1 and |Apos (α)|δ = Θ(n2 )

RE ⇒ NFA

α → ε | a | α + α | α · α | α?

(a ∈ Σ)

|α| = n |α|Σ = m I

Apos (α) - Position Automaton [Glu61]

I

|Apos (α)|Q = m + 1 and |Apos (α)|δ = Θ(n2 )

I

Apd (α) - Partial Derivative Automaton [Mir66, Ant96]

RE ⇒ NFA

α → ε | a | α + α | α · α | α?

(a ∈ Σ)

|α| = n |α|Σ = m I

Apos (α) - Position Automaton [Glu61]

I

|Apos (α)|Q = m + 1 and |Apos (α)|δ = Θ(n2 )

I

Apd (α) - Partial Derivative Automaton [Mir66, Ant96]

I

Apd is a quotient of Apos [CZ02]

RE ⇒ NFA

α → ε | a | α + α | α · α | α?

(a ∈ Σ)

|α| = n |α|Σ = m I

Apos (α) - Position Automaton [Glu61]

I

|Apos (α)|Q = m + 1 and |Apos (α)|δ = Θ(n2 )

I

Apd (α) - Partial Derivative Automaton [Mir66, Ant96]

I

Apd is a quotient of Apos [CZ02]

I

|Apd (α)|Q ≤ m + 1 and |Apd (α)|δ = Θ(n2 )

Average Complexity RE ⇒ NFA For the uniform distribution of α ∈ RE and asymptotically [Nic09, BMMR11, BMMR12]: I

|α|Σ ∼

|α| 2

I

|Apos (α)|δ = Θ(|α|) I

|Apd (α)|Q ∼

|Apos (α)|Q 2

|Apd (α)|δ ∼

|Apos (α)|δ 2

I

How to obtain an estimate of the average complexity τ ⇒ Apd (τ ) ?

The Analytic Combinatorics Way

Generating Functions

C combinatorial class C (z) =

X

z |c| =

X n

cn : number of objects of size n

cn z n

Symbolic Method

{•} =⇒ U(z) = z A ∪ B =⇒ A(z) + B(z) A × B =⇒ A(z)B(z) 1 A? =⇒ 1 − A(z)

Asymptotic Analysis

f (z) =

X

fn z n

n

ρ

0

cn ∼ θn ρ−n where θn is a sub-exponential factor



Generating Function for RE( ) (without ∅) α → ε | a | (α + α) | (α · α) | (α

 α) | α?

Rk (z) = (k + 1)U(z) + G(Rk × {+} × Rk ) + G(Rk × {·} × Rk )



+ G(Rk × { } × Rk ) + G(Rk × {? }) = (k + 1)U(z)+3U(z)Rk (z)2 +U(z)Rk (z) = (k + 1)z + 3zRk (z)2 + zRk (z).



Generating Function for RE( ) (without ∅) α → ε | a | (α + α) | (α · α) | (α

 α) | α?

Rk (z) = (k + 1)U(z) + G(Rk × {+} × Rk ) + G(Rk × {·} × Rk )



+ G(Rk × { } × Rk ) + G(Rk × {? }) = (k + 1)U(z)+3U(z)Rk (z)2 +U(z)Rk (z) = (k + 1)z + 3zRk (z)2 + zRk (z). p (1 − z) − ∆k (z) Rk (z) = , where ∆k (z) = 1 − 2z − (11 + 12k)z 2 6z The radius of convergence of Rk (z) is ρk =

√ −1+2 3+3k 11+12k



Generating Function for RE( ) (without ∅) α → ε | a | (α + α) | (α · α) | (α

 α) | α?

Rk (z) = (k + 1)U(z) + G(Rk × {+} × Rk ) + G(Rk × {·} × Rk )



+ G(Rk × { } × Rk ) + G(Rk × {? }) = (k + 1)U(z)+3U(z)Rk (z)2 +U(z)Rk (z) = (k + 1)z + 3zRk (z)2 + zRk (z). p (1 − z) − ∆k (z) Rk (z) = , where ∆k (z) = 1 − 2z − (11 + 12k)z 2 6z The radius of convergence of Rk (z) is ρk = 1

n

[z ]Rk (z) ∼

√ −1+2 3+3k 11+12k

3 (3 + 3k) 4 −n− 12 √ ρk (n + 1)− 2 6 π

Cumulative Generating Function for Number of Letters

α → ε | a | (α + α) | (α · α) | (α l(ε) = 0 l(a) = 1 l(α? ) = l(α)

 α) | α?

l(α + β) = l(α) + l(β) l(α · β) = l(α) + l(β) l(α β) = l(α) + l(β)



Cumulative Generating Function for Number of Letters

α → ε | a | (α + α) | (α · α) | (α l(ε) = 0 l(a) = 1 l(α? ) = l(α)

 α) | α?

l(α + β) = l(α) + l(β) l(α · β) = l(α) + l(β) l(α β) = l(α) + l(β)



Lk (z) = kz + 3zLk (z)Rk (z) + zLk (z) kz Lk (z) = p ∆k (z)

Cumulative Generating Function for Number of Letters

α → ε | a | (α + α) | (α · α) | (α l(ε) = 0 l(a) = 1 l(α? ) = l(α)

 α) | α?

l(α + β) = l(α) + l(β) l(α · β) = l(α) + l(β) l(α β) = l(α) + l(β)



Lk (z) = kz + 3zLk (z)Rk (z) + zLk (z) kz Lk (z) = p ∆k (z)

[z n ]Lk (z) ∼

k −n+ 12 − 1 n 2 √ 1 ρk 2 π(3 + 3k) 4

Cumulative Generating Function for an Upper Bound of the Size of π π(ε) = ∅ π(α + β) = π(α) ∪ π(β) π(a) = {ε} π(αβ) = π(α)β ∪ π(β) π(α? ) = π(α)α? π(α β) = π(α) π(β) ∪ π(α) {β} ∪ {α}



 

 π(β)

Cumulative Generating Function for an Upper Bound of the Size of π π(ε) = ∅ π(α + β) = π(α) ∪ π(β) π(a) = {ε} π(αβ) = π(α)β ∪ π(β) π(α? ) = π(α)α? π(α β) = π(α) π(β) ∪ π(α) {β} ∪ {α}



 

 π(β)

p(ε) = 0 p(α + β) = p(α) + p(β) p(a) = 1 p(αβ) = p(α) + p(β) ? p(α ) = p(α) p(α β) = p(α)p(β) + p(α) + p(β)



Cumulative Generating Function for an Upper Bound of the Size of π p(ε) = 0 p(α + β) = p(α) + p(β) p(a) = 1 p(αβ) = p(α) + p(β) p(α? ) = p(α) p(α β) = p(α)p(β) + p(α) + p(β)



Pk (z) = kz + 6zPk (z)Rk (z) + zPk (z) + zPk (z)2

Pk (z) =

p p 0 ∆k (z) ∆k (z) + 2z 2z

where ∆0k (z) = 1 − 2z − (11 + 16k)z 2 with zero ρ0k =

√ −1+2 3+4k 11+16k

Cumulative Generating Function for an Upper Bound of the Size of π p(ε) = 0 p(α + β) = p(α) + p(β) p(a) = 1 p(αβ) = p(α) + p(β) p(α? ) = p(α) p(α β) = p(α)p(β) + p(α) + p(β)



Pk (z) = kz + 6zPk (z)Rk (z) + zPk (z) + zPk (z)2 p p 0 ∆k (z) ∆k (z) + 2z 2z

Pk (z) =

where ∆0k (z) = 1 − 2z − (11 + 16k)z 2 with zero ρ0k = 1

n

[z ]Pk (z) ∼

−n− 12

−(3 + 3k) 4 ρk

1

√ −1+2 3+4k 11+16k 1

+ (3 + 4k) 4 (ρ0k )−n− 2 3 √ (n + 1)− 2 2 π

Average Size 3

avL =

3kρk (n + 1) 2 [z n ]Lk (z) =√ 1 n [z ]Rk (z) 3 + 3k n2

Average Size 3

avL =

3kρk (n + 1) 2 [z n ]Lk (z) =√ 1 n [z ]Rk (z) 3 + 3k n2 avP =

[z n ]Pk (z) [z n ]Rk (z)

Average Size 3

avL =

3kρk (n + 1) 2 [z n ]Lk (z) =√ 1 n [z ]Rk (z) 3 + 3k n2 avP =

lim

n,k→∞

log2 avP avL

[z n ]Pk (z) [z n ]Rk (z) = log2

lim avP 1/avL =

n,k→∞

4 ∼ 0.415 3 4 3

Average Size 3

avL =

3kρk (n + 1) 2 [z n ]Lk (z) =√ 1 n [z ]Rk (z) 3 + 3k n2 avP =

lim

n,k→∞

log2 avP avL

[z n ]Pk (z) [z n ]Rk (z) = log2

lim avP 1/avL =

n,k→∞

4 ∼ 0.415 3 4 3

Proposition 6 For large values of k and n an upper bound for the average number of states of Apd is ( 43 + o (1))|α|Σ .

Final Remarks

I

Experimental results suggest that the upper bound obtained may be lowered

Final Remarks

I

Experimental results suggest that the upper bound obtained may be lowered

I

Other shuffle operations should be analysed from an average case point of view.

Final Remarks

I

Experimental results suggest that the upper bound obtained may be lowered

I

Other shuffle operations should be analysed from an average case point of view.

I

Similar analysis for regular expressions with intersection or complement

Thank you for your attention!

V. M. Antimirov. Partial derivatives of regular expressions and finite automaton constructions. Theoret. Comput. Sci., 155(2):291–319, 1996. Sabine Broda, Ant´ onio Machiavelo, Nelma Moreira, and Rog´erio Reis. On the average state complexity of partial derivative automata. International Journal of Foundations of Computer Science, 22(7):1593–1606, 2011. Sabine Broda, Ant´ onio Machiavelo, Nelma Moreira, and Rog´erio Reis. On the average size of Glushkov and partial derivative automata. International Journal of Foundations of Computer Science, 23(5):969–984, 2012. J. M. Champarnaud and D. Ziadi.

Canonical derivatives, partial derivatives and finite automaton constructions. Theoret. Comput. Sci., 289:137–163, 2002. Wouter Gelade. Succinctness of regular expressions with interleaving, intersection and counting. Theor. Comput. Sci., 411(31-33):2987–2998, 2010. Hermann Gruber and Markus Holzer. Tight bounds on the descriptional complexity of regular expressions. In Volker Diekert and Dirk Nowotka, editors, 13th Developments in Language Theory (DLT 2009), volume 5583 of LNCS, pages 276–287. Springer, 2009. V. M. Glushkov. The abstract theory of automata. Russian Math. Surveys, 16:1–53, 1961. B. G. Mirkin.

An algorithm for constructing a base in a language of regular expressions. Engineering Cybernetics, 5:51—57, 1966. Alain J. Mayer and Larry J. Stockmeyer. Word problems-this time with interleaving. Inf. Comput., 115(2):293–311, 1994. Cyril Nicaud. On the average size of Glushkov’s automata. In Adrian Horia Dediu, Armand-Mihai Ionescu, and Carlos Mart´ın-Vide, editors, Proc. 3rd LATA, volume 5457 of LNCS, pages 626–637. Springer, 2009.