De Bruijn Sequences for Fixed-Weight Binary Strings - Semantic Scholar

Report 8 Downloads 144 Views
DE BRUIJN SEQUENCES FOR FIXED-WEIGHT BINARY STRINGS FRANK RUSKEY∗ , JOE SAWADA† , AND AARON WILLIAMS‡ Abstract. De Bruijn sequences are circular strings of length 2n whose length n substrings are the binary strings of length n. n Our focus is on creating circular strings of length w for the binary strings of length n with weight (number of 1s) equal to w. In this case, each fixed-weight string can be encoded by its first n−1 bits since the final bit is redundant. For this reason, we  n−1  construct circular strings of length n−1 + w−1 whose length n−1 substrings are the binary strings of length n−1 with weight w w or w−1. Our construction is reminiscent of the construction for the lexicographically least de Bruijn sequence, except the underlying algorithm is applied to cool-lex order instead of lexicographic order. The construction can be efficiently implemented so that successive blocks of n bits are generated in constant amortized time (CAT) while using O(n log n)-space. This article’s results were also used to create de Bruijn sequences for binary strings of length n with a specified maximum weight. Key words. de Bruijn sequences, universal cycles, FKM algorithm, necklaces, Lyndon words, cool-lex order, middle-levels, shift Gray code

1. Introduction. All strings in this paper are binary. Let B(n) denote the set of strings with length n. A de Bruijn sequence for B(n) (or simply a de Bruijn sequence) is a circular string of length 2n that contains each string in B(n) exactly once as a substring. De Bruijn sequences are also known as de Bruijn cycles. A de Bruijn sequence for B(3) appears in Figure 1.1, where the substrings are read clockwise from 12 o’clock and allow wrap-around. 11

00

10

substrings B(3): 000, 001, 010, 101, 011, 111, 110, 100

10

Fig. 1.1. A de Bruijn sequence for B(3).

One can prove that de Bruijn sequences exist for B(n) by using an Eulerian cycle in its associated de Bruijn graph (see Section 2). However, this standard proof does not directly lead to an efficient method for constructing an individual de Bruijn sequence due to the exponential size of the associated graph (2n−1 nodes and 2n directed arcs). A fundamental question is determining the computational complexity of generating a specific de Bruijn sequence. Perhaps the most famous construction that leads to an efficient algorithm is for the lexicographically least de Bruijn sequence, which Knuth calls the “grand-daddy” [15]. De Bruijn sequences have many applications including dynamic connections in overlay networks (Fraigniaud and Gauron [7]), genomics (Alekseyev and Pezner [1]), and software calculation of the ruler function in computer words (Knuth [16], Leiserson, Prokop, and Randall [17]). De Bruijn sequences also appear in textbooks on discrete mathematics (Graham, Knuth, and Patashnik [10]). Generalizations and variations have been investigated, most famously under the name universal cycles (Chung, Graham and Diaconis [4]). Interested readers can refer to the Generalizations of de Bruijn Cycles and Gray Codes proceedings [14]. Our paper gives a new variation of de Bruijn sequences that restricts the weight (number of 1s) of each string. Let Bw (n) denote the set of length n strings with fixed-weight w and let Buℓ (n) denote the set of length n strings with weight-range ℓ, ℓ + 1, . . . , u having a specified lower-bound ℓ and upper-bound u. In general, if L is a subset of B(n), then a de Bruijn sequence for L is a circular string of length |L| containing ∗ Department of Computer Science, University of Victoria, PO Box 3010 STN CSC, Victoria BC, V8W 3N4, Canada [email protected]. Research supported in part by an NSERC discovery grant. † School of Computer Science, University of Guelph, 217 Reynolds, Guelph ON, N1G 2W1, Canada [email protected]. Research supported in part by an NSERC discovery grant. ‡ School of Mathematics and Statistics, Carleton University 1125 Colonel By Drive, Ottawa ON, K1S 5B6, Canada [email protected]

1

0

1

11

11

1

0

0

0

each string in L exactly once as a substring. Strictly speaking, de Bruijn sequences for Bw (n) only exist in  trivial cases when w ∈ {0, 1, n − 1, n}. For example, the circular strings of length 42 = 6 containing 0011 0 0 0 and 1100 are 1 , 0 , and 0 but none are de Bruijn sequences for B2 (4). However, we can take advantage 11

of a simple fact: The last bit of each string in Bw (n) is redundant. That is, each α ∈ Bw (n) is completely determined by its first n−1 bits. For this reason, we say that a de Bruijn sequence for Bw w−1 (n − 1) is a fixed-weight de Bruijn sequence for Bw (n). The circular string in Figure 1.2 is a fixed-weight de Bruijn sequence for B3 (5). Its substrings of length four include each string in B32 (4) exactly once; appending the ‘missing’ bit extends each substring to a unique string in B3 (5). In general, the shorthand sequence of a fixed-weight de Bruijn sequence for Bw (n) is its circular sequence of substrings of length n − 1 and the longhand sequence is obtained by appending the missing bit to each string in the shorthand sequence so that each resulting string has weight w.

10

longhand B3 (5): 00111, 01110, 11100, 11010, 10101, 01011, 10110, 01101, 11001, 10011

101

shorthand B32 (4): 0011,1 0111,0 1110,0 1101,0 1010,1 0101,1 1011,0 0110,1 1100,1 10010

011

10

Fig. 1.2. A fixed-weight de Bruijn sequence for B3 (5).

Our main result is a construction of fixed-weight de Bruijn sequences for any Bw (n). A subsequent analysis shows that our “cool-daddy” de Bruijn sequences can be created efficiently, with successive blocks of n bits being generated in amortized O(1)-time while using only O(n log n)-space (Sawada and Williams [25, 26]). This is an improvement over algorithms that construct universal cycles one symbol at a time (compare Ruskey and Williams [22] to Ruskey, Williams, and Holroyd [12, 13] for an example of this improvement in the context of permutations). The space measurement is also important since certain algorithms for generating universal cycles use exponential space. The mathematical foundation for our construction uses a general result involving binary bubble languages and cool-lex order (Ruskey, Sawada, and Williams [20]). Our paper is organized as follows: Section 2 covers de Bruijn graphs, Section 3 describes the “granddaddy” de Bruijn sequence, Section 4 modifies the aforementioned construction, Section 5 covers cool-lex order, Section 6 gives the “cool-daddy” construction, and Section 7 discusses open problems. We mention that articles in this research area often use the term density when referring to the number of 1s in a binary string. This includes work by the authors of this article ([20, 25, 26]) and by other authors (Buck and Wiedemann [2], Wang and Savage [29], and Ueda [28]). However, it is more natural to interpret the ‘density’ of a length n binary string with w copies of 1 as the fraction w/n. For this reason, we use the term ‘weight’ in this article. 2. de Bruijn Graphs. The de Bruijn graph for B(n) is a directed graph whose node set is B(n−1). For each node α = a1 · · · an−1 and x ∈ {0, 1} there is an arc labeled x that is directed from α to β = a2 · · · an−1 x. Each arc represents a unique string αx ∈ B(n). The de Bruijn graph for B(4) is illustrated in Figure 2.1. 001 1

0 000

011 1 0

0 1

0

010 0

1 101

1 0

100

1 0

1

1 111

0 110

Fig. 2.1. The de Bruijn graph for B(4).

More generally, the de Bruijn graph for L ⊆ B(n) is a directed graph G(L) whose nodes are the length 2

n−1 prefixes and suffixes of the strings in L. There is an arc labeled x ∈ {0, 1} from α = a1 · · · an−1 to β = a2 · · · an−1 x if αx ∈ L. Again, each arc represents a unique string αx ∈ L. We are interested in de Bruijn graphs for L = Bw w−1 (n). 001

011

001

011

1 0 1

010

1 0 101

0

1

1 0

010

1

100

110

100

(a)

0

101 1 0

1

1

111 0

110

(b)

Fig. 2.2. Two de Bruijn graphs (a) G(B2 (4)), and (b) G(B32 (4)).

A directed graph is Eulerian if it has a directed cycle that includes each arc exactly once. It is wellknown that a directed graph is Eulerian if and only if it is balanced (every node has the same number of incoming and outgoing arcs) and strongly connected (there is a directed path from any node to any other node). Furthermore, Eulerian cycles in G(L) are in one-to-one correspondence with de Bruijn sequences for L. For example, Figure 2.2 (a) shows that G(B2 (4)) is not strongly connected, and this provides an alternate proof that there are no de Bruijn sequences for B2 (4). The connection between de Bruijn sequences and de Bruijn graphs can be found in de Bruijn’s paper for B(n) [5]; also see his note on the history of these observations [6]. Table 2.1 illustrates the connection between an Eulerian cycle in Figure 2.2 (b) and the fixed-weight de Bruijn sequence in Figure 1.2. Table 2.1 (i) and Figure 2.2 (b) also illustrate that the node w set of G(Bw w−1 (n)) is Bw−2 (n − 1).

Eulerian nodes 011 110 100 001 011 111 110 101 010 101 B31 (3) (i)

cycle arcs 0 0 1 1 1 0 1 0 1 1 B(1) (ii)

Substrings shorthand longhand 0110 01101 1100 11001 1001 10011 0011 00111 0111 01110 1110 11100 1101 11010 1010 10101 0101 01011 1011 10110 B32 (4) B3 (5) (iii) (iv)

Table 2.1 (i) The nodes along an Eulerian cycle in the de Bruijn graph G(B32 (4)) from Figure 2.2 (b), (ii) arc labels on this Eulerian cycle, and the fixed-weight de Bruijn sequence for B3 (5) in Figure 1.2, (iii) its shorthand sequence, and (iv) its longhand sequence.

w The remainder of this section shows that G(Bw w−1 (n)) is Eulerian. Since G(Bw−1 (n)) is directed, we shorten directed path to path.

Lemma 2.1. G(Bw w−1 (n)) is balanced. w Proof. The node set of G(Bw w−1 (n)) is Bw−2 (n − 1). Each α ∈ Bw−1 (n − 1) has in- and out-degree 2, and each α ∈ Bw−2 (n − 1) ∪ Bw (n − 1) has in- and out-degree 1.

The fact that many nodes in G(Bw w−1 (n)) have out-degree 1 contributes to the difficulty of proving that it is strongly connected. As a specific example, a maximum length shortest path in G(B98 (18)) is illustrated 3

length 85

length 13

(b)

(a)

Fig. 2.3. Shortest path from 00000000111111111 to 11110000011111000 in (a) G(B(18)) and (b) G(B98 (18)). Nodes are read top-to-bottom with black and white squares respectively for 0 and 1.

in Figure 2.3 (b) and has length 85. In contrast, Figure 2.3 (a) illustrates that trivial paths of length at most n−1 exist between each pair of nodes in the original de Bruijn graph G(B(n)). To prove that G(Bw w−1 (n)) is strongly connected we repeatedly apply the following lemma. Lemma 2.2. Let α = a1 · · · an−1 and β = b1 · · · bn−1 be distinct nodes in G(Bw w−1 (n)). If i is the smallest integer such that ai 6= bi , then there exists a path from α to some node with prefix b1 · · · bi . Proof. We assume ai = 0 since the ai = 1 case is similar. Note that α has three possible weights since w the node set of G(Bw w−1 (n)) is Bw−2 (n − 1). For each weight we provide a valid path (labeled by the arcs) that ends at a node with prefix b1 · · · bi .

If α has weight w−2, then this path suffices: h1, a1 , . . . , ai−1 , 1, ai+1 , . . . , an−1 i. If α has weight w−1, then this path suffices: h0, a1 , . . . , ai−1 , 1, ai+1 , . . . , an−1 i. If α has weight w, then there exists i < j ≤ n such that aj = 1. (Otherwise, α = a1 · · · ai−1 0n−i has weight w, so α’s prefix a1 · · · ai−1 0 has weight w. However, this implies β’s prefix a1 · · · ai−1 1 has weight w + 1, which contradicts β ∈ Bw w−2 (n − 1).) First we find a path from α to γ = a1 · · · aj−1 0aj+1 · · · an−1 as follows: h0, a1 , . . . , aj−1 , 0, aj+1 , . . . , an−1 i. Since γ has weight w−1 and has the same prefix of length i as α, we can complete our path by applying the path from the w−1 case. Through repeated application of this lemma we obtain the following corollary. Corollary 2.3. G(Bw w−1 (n)) is strongly connected. We obtain the following theorem from Lemma 2.1 and Corollary 2.3. Theorem 2.4. G(Bw w−1 (n)) is Eulerian. Theorem 2.4 implies the existence of fixed-weight de Bruijn sequences. Section 6 strengthens this result by providing an explicit construction. 3. The FKM Algorithm. While de Bruijn graphs can be used to prove that de Bruijn sequences exist, we are instead interested in efficiently constructing individual de Bruijn sequences. Martin [18] examined this issue in 1934, and proposed a simple backtracking approach that builds a de Bruijn sequence for B(n) one bit at a time. In fact, by slightly modifying his presentation, the de Bruijn sequence he creates is the lexicographically least for each value of n. Unfortunately, Martin’s approach is algorithmically infeasible since it requires exponential space. Fredericksen, Kessler and Maiorana [9, 8] discovered a direct method — the “FKM algorithm” — for constructing the lexicographically least de Bruijn sequence for B(n). Describing their method requires the introduction of several basic concepts. Given distinct strings α = a1 · · · an and β = b1 · · · bm , α is less than β in lexicographic order if there exists an i such that a1 · · · ai = b1 · · · bi and either i = n or ai+1 < bi+1 . The sequence of a set of strings L listed in lexicographic order is denoted lex(L). A necklace is a string in its lexicographically least rotation. That is, α = a1 a2 · · · an is a necklace if aj aj+1 · · · an a1 a2 · · · aj−1 ≥ α for all j. The set of necklaces in 4

B(n) and Bw (n) are denoted N(n) and Nw (n) respectively. The aperiodic prefix of string α is its shortest prefix whose repeated concatenation yields α. That is, the aperiodic prefix of α = a1 a2 · · · an is the shortest γ = a1 a2 · · · ak such that γ n/k = α, where exponentiation denotes repeated concatenation. The aperiodic prefix of α is denoted by ρ(α). If ρ(α)n/k = α, then the number of distinct rotations of α is k; we say that α is aperiodic if k = n and is periodic otherwise. A Lyndon word is an aperiodic necklace. The set of Lyndon words in B(n) and Bw (n) are denoted L (n) and Lw (n), respectively. The FKM algorithm [8] produces a circular string fkm(n) that is the concatenation of the Lyndon words whose length divides n in lexicographic order. That is,

fkm(n) = ℓ1 · ℓ2 · · · ℓm

  [ where lex L (n/j) = ℓ1 , ℓ2 , . . . , ℓm .

(3.1)

j|n

Figure 3.1 illustrates fkm(6) with (a) showing the Lyndon words and (b) showing their circular concatenation with · as a visual separator (Figure 3.1 (c) and (d) are explained in Section 4). The surprising connection between lexicographic order, the FKM algorithm, and de Bruijn sequences is given in Theorem 3.1. Theorem 3.1 (“Grand-daddy” [8]). fkm(n) is a de Bruijn sequence for B(n). Lyndon words de Bruijn sequence fkm(6) L (1) , L (2) , L (3) , L (6)

1×010111 11×0 ×01 11 0 1 0

01011×0011 1×0 01 × ×00

(a)

1×0×0000 111× 01 11 ×0 ×0

10 1×000 1×0001 01 11 00

lexicographic order

0 000001 000011 000101 000111 001 001011 001101 001111 01 010111 011 011111 1

necklaces aperiodic N(6) prefixes

(b)

000000 000001 000011 000101 000111 001001 001011 001101 001111 010101 010111 011011 011111 111111

0 000001 000011 000101 000111 001 001011 001101 001111 01 010111 011 011111 1

(c)

(d)

Fig. 3.1. Concatenating the Lyndon words of length 1, 2, 3, 6 in lexicographic order in (a) gives the “grand-daddy” de Bruijn sequence fkm(6) in (b). This construction can also be obtained by concatenating the aperiodic prefix in (d) of the necklaces of length 6 in (c).

The de Bruijn sequence in Theorem 3.1 is the lexicographically least de Bruijn sequence for each B(n). A careful analysis by Ruskey, Savage, and Wang [19] proved that each successive bit in fkm(n) can be generated in amortized O(1)-time while using O(n)-space. In fact, their algorithm visits successive blocks of n bits in this time and space complexity. Unfortunately, fixed-weight de Bruijn sequences are not created by restricting the FKM algorithm to the appropriate fixed-weight Lyndon words. To make this observation precise, let 

fkmw (n) = ℓ1 · ℓ2 · · · ℓm where lex

[

j| gcd(w,n)



Lw/j (n/j) = ℓ1 , ℓ2 , . . . , ℓm .

(3.2)

Figure 3.2 illustrates that fkm4 (8) is not a fixed-weight de Bruijn sequence. In particular, fkm4 (8) has invalid substrings such as 0100100 ∈ / B43 (7), and repeated substrings such as 1110001. Although fkmw (n) is not a fixed-weight de Bruijn sequence, it does have the correct length of

n w

 . To

understand why this is true, observe that if α ∈ Bw (n) has k distinct rotations, then the rotations of α 5

Lyndon words L4 (8)

1 01

×0 1 × 0 0 0 0 1 1

11 ×0 10

01 0 1 1 0 1 × 0 0 1 1 × 00

0 11

00

1 11

×00 0

1×0 01

× 0 0 0 1 1 01 1

11

10

1 ×0 0

1 0 0 11 1 × 0

01

01

lexicographic order

00001111 00010111 00011011 00011101 00100111 00101011 00101101 0011 00110101 01

fkm4 (8)

(b)

(a)

necklaces N4 (8)

aperiodic prefixes

00001111 00010111 00011011 00011101 00100111 00101011 00101101 00110011 00110101 01010101

00001111 00010111 00011011 00011101 00100111 00101011 00101101 0011 00110101 01

(c)

(d)

Fig. 3.2. Concatenating the Lyndon words of length 2, 4, 8 and weight 1, 2, 4 respectively in lexicographic order in (a) does not give a fixed-weight de Bruijn sequence fkm4 (8) in (b). The substring 1110001 is repeated, and the substring 010010 is invalid. This construction can also be obtained by concatenating the aperiodic prefix in (d) of the necklaces of length 8 and weight 4 in (c).

will contribute k bits to fkmw (n). Since fkmw (n) has the correct length, we will consider ‘rearranging’ its constituent Lyndon words in Section 6. 4. Necklace-Prefix Algorithm. In this section we reformulate the FKM algorithm and then provide a simple generalization. Instead of describing fkm(n) as the concatenation of Lyndon words whose length divides n, it can be described as the concatenation of the aperiodic prefixes of the necklaces of length n. That is, fkm(n) = ρ(η1 ) · ρ(η2 ) · · · ρ(ηm ) where lex(N(n)) = η1 , η2 , . . . , ηm .

(4.1)

To see why the concatenations in (3.1) and (4.1) are identical, simply observe that ρ(ηi ) = ℓi . The fixedweight variant of fkm(n) can be similarly described as follows fkmw (n) = ρ(η1 ) · ρ(η2 ) · · · ρ(ηm ) where lex(Nw (n)) = η1 , η2 , . . . , ηm .

(4.2)

These two restatements of the FKM algorithm are illustrated in Figure 3.1 (c)-(d) and 3.2 (c)-(d). The advantage of (4.2) over (3.2) is that lexicographic order can be replaced by other previously developed orders of fixed-weight necklaces. For the remainder of this article, a necklace-prefix algorithm refers to the concatenation of the aperiodic prefixes of Nw (n) arranged in some order. The reasoning at the end of Section 3 explains why the necklace-prefix algorithm produces circular strings of the correct length. There are two previously developed orders for fixed-weight necklaces [29, 28]. However, in both cases the necklace-prefix algorithm does not produce a fixed-weight de Bruijn sequence due to invalid strings. This fact is explained by the following lemma, which provides a necessary condition on the weight of prefixes in consecutive aperiodic necklaces. Lemma 4.1. Suppose L is an ordering of Nw (n) that contains consecutive aperiodic necklaces α = a1 · · · an and β = b1 · · · bn . If there exists j such that j X i=1

ai −

j−1 X

bi ∈ / {0, 1},

i=1

6

then applying the necklace-prefix algorithm to L will not result in a fixed-weight de Bruijn sequence for Bw (n) due to invalid substrings. Proof. Observe that γ = aj+1 · · · an b1 · · · bj−1 is an invalid substring since its weight can be computed as follows: Σγ = w − Σ(a1 · · · aj ) + Σ(b1 · · · bj−1 ) ∈ / {w, w − 1}. The invalid substring 0100100 in Figure 3.2 is explained by Lemma 4.1 using α = 00011101, β = 00100111, and j = 6. The lemma suggests that the necklace-prefix algorithm should be applied to orders that do not significantly change the weight of each prefix. Such an ordering is discussed in the next section. 5. Cool-lex Order. This section discusses the cool-lex order for fixed-weight binary strings and necklaces. The reverse order for fixed-weight necklaces is used in the next section. Cool-lex order is a shift Gray  n code for Bw (n), meaning that the w fixed-weight binary strings are ordered so that successive strings differ

by a single shift. If α = a1 a2 · · · an , then a shift from the jth position to the ith position with i < j causes the substring ai ai+1 · · · aj to be replaced by aj ai ai+1 · · · aj−1 . In other words, the symbol aj is removed and then reinserted somewhere to the left in position i; the intermediate symbols accommodate this shift by moving one position to the right. This operation is denoted by shiftα (j, i), which we shorten to shift(j, i) when the initial string is clear. There is a very simple rule for cyclically creating the cool-lex order of Bw (n) one string at a time: If α ∈ Bw (n) and k is the length of its longest prefix of the form 0∗ 1∗ , then the next string in cool-lex order is shift(min(k+2, n), 1). This rule was discovered by Ruskey and Williams [21], although in our discussion all bits are complemented with respect to its original presentation. By convention, the last string in the cool-lex order of Bw (n) is 0n−w 1w . Table 5.1 (a)-(b) illustrates the cool-lex order of B4 (8) and the shifts according to this rule. cool (B4 (8)) 10000111 01000111 00100111 00010111 10001011 ... 01111000 00111100 00011110 00001111 (a)

Gray code cool (N4 (8)) shift(3, 1) 00100111 shift(4, 1) 00010111 shift(5, 1) 00101011 shift(6, 1) 00110011 shift(3, 1) 00011011 ... 00101101 shift(7, 1) 01010101 shift(8, 1) 00110101 shift(8, 1) 00011101 shift(8, 1) 00001111 (b) (c)

Gray code shift(4, 1) shift(6, 3) shift(5, 3) shift(5, 1) shift(7, 3) shift(5, 2) shift(3, 1) shift(5, 1) shift(7, 1) shift(8, 3) (d)

case condition reverse (5.1b) as+t+2 = 0 00001111 (5.1c) 00011101 (5.1c) 00110101 (5.1b) as+t+2 = 0 01010101 (5.1c) 00101101 (5.1c) 00011011 (5.1b) β∈ /L 00110011 (5.1b) β∈ /L 00101011 (5.1b) β∈ /L 00010111 (5.1a) 00100111 (e) (f) (g)

Table 5.1 (a) Cool-lex orders for B4 (8) and (b) the shifts that generate this order. The cool-lex order of N4 (8) appears in (c), along with the corresponding shifts according to (5.1) in (d)-(f ). Column (g) gives the reverse order of (c).

Given L ⊆ Bw (n) let cool (L) represent the order of strings in L according to the cool-lex order of Bw (n). Recently, it was shown that cool (Nw (n)) is also a shift Gray code [20]. Furthermore, the following rule cyclically creates the order one string at a time1 [20]. Table 5.1 (c)-(f) illustrates the cool-lex order of N4 (8) along with the shifts and cases according to this rule.

1 Condition

5.1b is restated in a slightly simplified form since 0n is the only necklace ending in 0. 7

Cool-lex Gray code for Necklaces [20] Let α = 0s 1t γ ∈ Nw (n) where s, t > 0 and γ is empty or begins with 0. The necklace following α in cool-lex order is denoted next(α) and is obtained from α by the following shift  shift(s+t, i+1)   shift(s+t+1, 1) next(α) =   shift(s+t+2, i+1)

if γ = ǫ

(5.1a)

if as+t+2 = 0 or β ∈ / Nw (n)

(5.1b)

otherwise

(5.1c)

where β = shiftα (s+t+2, s+t+1), and i is the minimum value such that 0i 10s−i 1t−1 γ ∈ Nw (n). In [26] it is proven that cool (Nw (n)) can be generated in constant amortized time. Reverse cool-lex order is cool-lex order with the relative order of the strings reversed (see Table 5.1 (c) and (g)). The advantage of reverse cool-lex order is that it satisfies Lemma 4.1. We complete this section with two results. Lemma 5.1. [20] If α is a necklace, then swapping its first 10 (if it exists) to 01 yields another necklace. Lemma 5.2. If α is a periodic necklace, then next(α) is an aperiodic necklace. Proof. There are two cases. If as+t+2 = 0, then next(α) = shift(s+t+1, 1) by (5.1b). If as+t+2 = 0, then β = shift(s+t+2, s+t+1) ∈ / Nw (n) and so next(α) = shift(s+t+1, 1) by (5.1b). Therefore, 0s+1 is a prefix of next(α) so it is aperiodic, since this is its only 0s+1 substring. 6. Cool-Daddy de Bruijn sequences. Let Cw (n) denote the result of applying the necklace-prefix algorithm to the reverse cool-lex order of the necklaces of length n and weight w. That is, Cw (n) = ρ(η1 ) · ρ(η2 ) · · · ρ(ηm ) where cool (Nw (n)) = ηm , ηm−1 , . . . , η1 .

(6.1)

Figure 6.1 illustrates that C4 (8) is a fixed-weight de Bruijn sequence for B4 (8), and this section proves this result in general. To simplify our presentation, we define an additional circular string Dw (n) as the concatenation of the necklaces of length n and weight w without first reducing each necklace to its aperiodic prefix. In other words, Dw (n) concatenates the necklaces in their entirety, regardless of whether they are periodic or aperiodic. That is, Dw (n) = η1 · η2 · · · ηm where cool (Nw (n)) = ηm , ηm−1 , . . . , η1 .

(6.2)

 n The length of Dw (n) exceeds w , so its substrings of length n−1 must contain repeated strings in Bw w−1 (n−1). Although it has repeated strings, Theorem 6.1 proves that Dw (n) does not miss any strings. In other words, Dw (n)’s substrings include each string in Bw w−1 (n − 1) at least once. Theorem 6.2 then completes our main result by proving that Cw (n) contains each string in Bw w−1 (n) exactly once. In these proofs we let prev(α) denote the necklace before α in cool-lex order. That is, next(prev(α)) = α. Theorem 6.1. The circular string Dw (n) contains each string in Bw w−1 (n − 1) as a substring. Proof. Every string in Bw w−1 (n − 1) can be written as pq such that qxp ∈ Nw (n) with x ∈ {0, 1}. Our goal is to demonstrate a necklace α ∈ Nw (n) such that pq is a substring of next(α) · α, and thereby a substring of Dw (n). Specifically, we will provide α with prefix q such that next(α) has suffix p. As a special case, if p or q is empty then clearly we can let α = qxp. If q has prefix 0s 1t 0 where s, t > 0, then α = qxp suffixes (next(α) is obtained from (5.1b) or (5.1c)). Otherwise, since qxp is a necklace, we can assume that q = 0s 1t where s > 0 and t ≥ 0. For this remaining case we consider the two possible values for x separately and assume that the longest prefix of the form 0∗ 1∗ in qp is 0i 1j where i, j > 0. 8

00001111

00001111

00011101

00011101

00110101

00110101

01010101

01

00101101

00101101

00011011

00011011

fixed-weight de Bruijn sequence C4 (8)

1011 × 00010 010 11 ×0 1

aperiodic prefixes

11 × 0000111 1× 001 1 00 00

×

001101 0 1 ×0 01 × 1 11 × 01

0011

00101011

00101011

00010111

00010111

00100111

00100111

(a)

(b)

(c)

11

00110011

1 × 00011011 110 ×0 10 0 00

cool-lex order

necklaces N4 (8)

Fig. 6.1. Concatenating the aperiodic prefix in (b) of the necklaces of length 8 and weight 4 in reverse cool-lex order in  (a) creates the “cool-daddy” fixed-weight de Bruijn sequence C4 (8) in (c). The substrings of the cycle are the 84 = 70 strings 4 in B3 (7).

Assume x = 1. If p = 1k , then α = qxp suffices (next(α) is obtained from (5.1a)). Otherwise, consider two cases depending on t. ⊲ t ≥ 1: Transpose the first 10 to 01 in q1p to obtain α, which is a necklace by Lemma 5.1 (next(α) is obtained from (5.1c)). Note that the first 10 must occur after q, and hence α has prefix q. ⊲ t = 0: If qp = 0i 1j then α = qp1 suffices (next(α) is obtained from (5.1a)); otherwise obtain α by inserting x = 1 into position i + j + 2 (after the first 10) of qp (next(α) is obtained from (5.1c)). Assume x = 0. Again we consider two cases depending on t. ⊲ t ≥ 1: Obtain α by inserting x = 0 into qp as far right as possible up to position i + j + 1 so that the resulting string is a necklace (next(α) is obtained from (5.1b)). Note that the 0 will be inserted after q since q0p is a necklace. ⊲ t = 0: If it is possible to insert x = 0 past the first 1 in qp to obtain a necklace, then apply α as described when t ≥ 1. Otherwise, construct α so that next(α) = q0p. Observe that α has prefix q and next(α) is obtained by (5.1b). Theorem 6.2. The circular string Cw (n) is a fixed-weight de Bruijn sequence for Bw (n).  n Proof. Since Cw (n) has the correct length of w , we need only show that every string in Bw w−1 (n) appears

as a substring in Cw (n). From Theorem 6.1, this means that we need only show that every substring in Dw (n) of length n−1 is also a substring in Cw (n). For this reason, let us consider an arbitrary periodic necklace Nw (n) of the form γ k where γ is the aperiodic prefix. Since consecutive necklaces cannot both be periodic by Lemma 5.2, we must show that each length n−1 substring of next(γ k ) · γ k · prev(γ k ) is also a

substring of next(γ k )·γ ·prev(γ k ). This can be verified by applying the iterative cool-lex rules and considering two cases for γ where s, t > 0 and ω is non-empty: ⊲ γ = 0s 1t next(γ k ) · γ · prev(γ k ) = · · · 0s−1 1t γ k−2 · γ · 0s 1t−1 · · · ⊲ γ = 0s 1t 0ω next(γ k ) · γ · prev(γ k ) = · · · γ k−1 · γ · prev(γ k ). From this illustration, it should be clear in both cases that each length n−1 substring in next(γ k ) · γ k · prev(γ k ) is also a substring of next(γ k ) · γ · prev(γ k ). 7. Summary and Open Problems. This paper provides an explicit fixed-weight de Bruijn sequence. It is constructed by concatenating the aperiodic prefixes of fixed-weight necklaces in reverse cool-lex order. 9

An algorithm in [26] shows that this fixed-weight de Bruijn sequence can be generated efficiently, with successive blocks of n bits being generated in amortized O(1)-time while using only O(n log n)-space. In addition to these results, we also investigated the de Bruijn graph G(Bw w−1 (n)). We conclude with additional observations and natural open problems: 1. Can specific constructions of weight-range de Bruijn sequences for Buℓ (n) with ℓ < u be generated efficiently? Theorem 6.2 proves the answer is ‘yes’ for ℓ = u−1. More specifically, Cw (n+1) provides an explicit construction since its substrings of length n are precisely Buℓ (n) for ℓ = w − 1 and u = w. Sawada, Stevens, and Williams [24] have solved the ℓ = 0 case. Their construction efficiently ‘glues’ together copies of Ci (n + 1) for i = w, w − 2, w − 4, . . . (and inserts a single 0 to create the substring 0n when the maximum-weight u is even). This construction was recently extended to include all cases where u − ℓ is odd by Stevens and Williams [27] by using the necklace-prefix algorithm and a natural generalization of cool-lex order from Bw (n) to Buℓ (n). The remaining open case is when u − ℓ is even, where the weight-range {ℓ, ℓ + 1, . . . , u} contains an odd number of values. 2. The set Bw w−1 (n) with n = 2w − 1 is known as the middle-levels. A well-known open problem is to determine if there is a Hamming distance 1 Gray code for the middle-levels (see Savage and Winkler [23]). Our results have provided a universal cycle for the middle-levels. 3. The shorthand sequence of Bw w−1 (n − 1) appears in a single-track order when obtained from a fixed-weight de Bruijn sequence for Bw (n) (see Hiltgen et al [11] for single-track Gray codes). For example, see Table 2.1 (iii). Which other sets of binary strings have single-track orders? 4. The longhand sequence of Bw (n) appears in a special cyclic order when obtained from a fixedweight de Bruijn sequence for Bw (n): Successive strings differ by the prefix-rotation σn or σn−1 . For example, see Table 2.1 (iv). It was proven that σ2 and σn cannot be used to create a Gray code for Bw (n) by Cheng [3]. The sufficiency of σn−1 and σn , and the insufficiency of σ2 and σn , are two special cases of a general question asked in [21]: Which sets of σi are necessary and sufficient for generating a (cyclic) Gray code for Bw (n)? 5. What is the maximum and minimum number of σn that can be used in a (σn ,σn−1 ) Gray code for Bw (n)? How many of each operation are used when Bw (n) is ordered according to Cw (n)? The analogous maximization problem has been solved for permutations by a natural construction [13]. 6. The grand-daddy de Bruijn sequence for B(n) is the first de Bruijn sequence for B(n) in lexicographic order. The cool-daddy de Bruijn sequences for Bw (n) are neither the first nor last in lexicographic order or cool-lex order. For example, 0011101011 from Figure 1.2 is bracketed by the fixed-weight de Bruijn sequences 0011010111 and 1110101100 in both lexicographic and cool-lex order. What is the first fixed-weight de Bruijn sequence for Bw (n) in lexicographic order, and can it be constructed directly without backtracking? 7. The necklace-prefix algorithm creates de Bruijn sequences when using lexicographic order, and creates fixed-weight de Bruijn sequences when using reverse cool-lex order. Are these orders special in this respect or are there many orders with these properties? 8. Removing the last redundant symbol of each string is also used in constructing universal cycles for permutations [22, 12, 13]. Similarly, shorthand representations are natural for any fixed-content language in which the frequency of each symbol is fixed. Which other fixed-content languages have shorthand universal cycles? A final open problem is to determine the diameter (length of the longest shortest path) of the de Bruijn w graph for Bw w−1 (n), or more generally Bu (n). For small values of n and w we computed the diameter of

10

G(Bw w−1 (n)) in Table 7.1, as well as pairs of nodes that achieve the maximum diameter for each n. (Table n−w+1 (n)) are isomorphic.) A conjecture appears below. 7.1 assumes w ≥ ⌈n/2⌉ since G(Bw w−1 (n)) and G(Bn−w

n 5 6 7 8 9 10 11 12 13 14 15 16 17 18

w = ⌈n/2⌉, . . . , n−1 7 7 10 9 13 14 11 18 17 13 22 22 20 15 27 25 23 17 32 33 29 26 19 39 37 34 29 21 45 45 41 39 32 23 52 49 46 43 35 25 59 60 55 53 47 38 27 68 65 61 58 51 41 29 76 76 71 67 62 55 44 31 85 81 79 76 66 59 47 33

(α,β) (0011, 1010) (00111, 10011) (000111, 100110) (0001111, 1100110) (00001111, 11000110) (000011111, 110001110) (0000011111, 1100011100) (00000111111, 11100011100) (000000111111, 111000011100) (0000001111111, 1110000111100) (00000001111111, 11100001111000) (000000011111111, 111100001111000) (0000000011111111, 1111000001111000) (00000000111111111, 11110000011111000)

Table 7.1 Diameter of G(Bw w−1 (n)) for n ≤ 18. The (α,β) pairs give strings at maximum distance for each n.

Conjecture 7.1. The de Bruijn graph G(Bw w−1 (n)) has maximal diameter ⌊

n+1 2



/2⌋ when w = ⌊n/2⌋

for (n mod 4) ≡ 3 and w = ⌈n/2⌉ otherwise. Moreover, this maximal diameter is obtained by the nodes 0x 1y n−1 n n+3 n+2 n−3 and 1a 0b 1c 0d for x = ⌊ n−1 2 ⌋, y = ⌈ 2 ⌉, a = ⌊ 4 ⌋, b = ⌊ 4 ⌋, c = ⌊ 4 ⌋, and d = ⌊ 4 ⌋. 8. Acknowledgments. The authors thank Glenn Hurlbert and Garth Isaak for helpful discussions regarding de Bruijn graphs at canadam 2009. REFERENCES [1] Max A. Alekseyev and Pavel A. Pevzner. Colored de Bruijn graphs and the genome halving problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 4(1):98–107, January 2007. [2] M. Buck and D. Wiedemann. Gray codes with restricted density. Discrete Mathematics, 48(2–3):163–171, 1984. [3] Yongxi Cheng. Generating combinations by three basic operations. Journal of Computer Science and Technology, 22(6):909–913, 2007. [4] F. Chung, P. Diaconis, and R.L. Graham. Universal cycles for combinatorial structures. Discrete Mathematics, 110:43–59, 1992. [5] N.G. de Bruijn. A combinatorial problem. Koninkl. Nederl. Acad. Wetensch. Proc. Ser A, 49:758–764, 1946. [6] N.G. de Bruijn. Acknowledgement of priority to C. Flye Sainte-Marie on the counting of circular arrangements of 2n zeros and ones that show each n-letter word exactly once. T.H. Report 75-WSK-06, Technological University Eindhoven, 1975. 13 pages. [7] Pierre Fraigniaud and Philippe Gauron. D2B: A de Bruijn based content-addressable network. Theoretical Computer Science, 355(1):65 – 79, 2006. [8] H. Fredericksen and I. J. Kessler. An algorithm for generating necklaces of beads in two colors. Discrete Mathematics, 61:181–188, 1986. [9] H. Fredericksen and J. Maiorana. Necklaces of beads in k colors and kary de Bruijn sequences. Discrete Mathematics, 23(3):207–210, 1978. [10] R. L. Graham, D. E. Knuth, and O. Patashnik. Concrete Mathematics. Adison Wesley, 1994. [11] A.P. Hiltgen, K.G. Paterson, and M. Brandestini. Single-track Gray codes. IEEE Transactions on Information Theory, 42(5):1555–1561, Sept. 1996. [12] A. Holroyd, F. Ruskey, and A. Williams. Faster generation of shorthand universal cycles for permutations. In COCOON 2010: The 16th Annual International Computing and Combinatorics Conference, volume 6196 of Lecture Notes in Computer Science, pages 298–307, Nha Trang, Vietnam, 2010. Springer-Verlag. [13] A. Holroyd, F. Ruskey, and A. Williams. Shorthand universal cycles for permutations. Algorithmica, 2012, to appear. 11

[14] G. Hurlbert, B. Jackson, and B. Stevens, editors. Generalizations of de Bruijn Cycles and Gray Codes, volume 309 of Discete Mathematics. Elsevier, 2009. [15] D. E. Knuth. The Art of Computer Programming, volume 4 fascicle 2: Generating All Tuples and Permutations. AddisonWesley, errata (updated 10/02/2008) edition, 2005. ISBN 0-201-85393-0. [16] D. E. Knuth. The Art of Computer Programming, volume 4 fascicle 1 - Bitwise Tricks & Techniques, Binary Decison Diagrams. Addison-Wesley, 2009. ISBN 0-321-58050-8. [17] C.E. Leiserson, H. Prokop, and K.H. Randall. Using de Bruijn sequences to index a 1 in a computer word, 1998. [Online; accessed June 2009]. [18] M. H. Martin. A problem in arrangements. Bull. Amer. Math. Soc., 40:859–864, 1934. [19] F. Ruskey, C. Savage, and T.M.Y. Wang. Generating necklaces. J. Algorithms, 13:414–430, 1992. [20] F. Ruskey, J. Sawada, and A. Williams. Binary bubble languages and cool-lex Gray codes. Journal of Combinatorial Theory, Series A, 119(1):155–169, 2012. [21] F. Ruskey and A. Williams. The coolest way to generate combinations. Discrete Mathematics, 309(17):5305–5320, September 2009. [22] F. Ruskey and A. Williams. An explicit universal cycle for the (n − 1)-permutations of an n-set. ACM Transactions on Algorithms, 6(3), June 2010. [23] C. Savage and P. Winkler. Monotone Gray codes and the middle levels problem. J. Combin. Theory Ser. A, 70(2):230–248, 1995. [24] J. Sawada, B. Stevens, and A. Williams. De Bruijn sequences for the binary strings with a maximum density. In WALCOM 2011: The 5th International Workshop on Algorithms and Computation, volume 6552 of Lecture Notes in Computer Science, pages 182–190, New Dehli, India, 2011. Springer-Verlag. [25] J. Sawada and A. Williams. Efficient oracles for generating binary bubble languages. Electronic Journal of Combinatorics, 19:P42, 2012. [26] J. Sawada and A. Williams. A Gray code for fixed-density necklaces and Lyndon words in constant amortized time. Theoretical Computer Science, 2012, in press (DOI: 10.1016/j.tcs.2012.01.013). [27] B. Stevens and A. Williams. The coolest order of binary strings. In FUN ’12: Sixth International Conference on Fun with Algorithms, Lecture Notes in Computer Science, San Servolo, Italy, 2012, to appear. Springer-Verlag. [28] T. Ueda. Gray codes for necklaces. Discrete Mathematics, 219(1-3):235–248, 2000. [29] T.M.Y. Wang and C. Savage. A Gray code for necklaces of fixed density. SIAM Journal on Discrete Mathematics, 9(4):654–673, 1996.

12