arXiv:1510.05512v2 [cs.DM] 29 Oct 2015
Arithmetic for Rooted Trees Fabrizio Luccio Department of Informatics, University of Pisa October 2015 Abstract We propose a new arithmetic for arbitrary rooted unordered trees simply called trees. The vertices of our trees have any number of children, and if a tree A coincides with B by reshuffling the subtrees rooted at the children of any of its vertices we have A = B. The document is organized as follows. In Section 1 we present the basic notation and the tree representation in form of a binary sequence. In Section 2 we show an ordering of trees for their enumeration. In Section 3 we define the arithmetic operations on trees of addition, addition-plus, and multiplication, and discuss their properties. In Section 4 we show how all trees can be generated by addition and addition-plus from a starting tree and that, for this purpose, both operations are needed. In Section 5 we show how a given tree can be obtained as the sum, sum-plus, or product of two trees, thus defining prime trees with respect to the three operations. We also show a connection between composite trees and integer factorization, and prove how primality of a tree can be decided in time polynomial in the number of vertices. In Section 6 we suggest in which fields these concepts can be useful. In Section 7 we discuss the two lines of similar studies appeared in the literature. This discussion is confined in the last section since previous studies do not influence our approach: to the best of our knowledge our results are completely new.
1
Basic notation • 0, 1, and 2 respectively denote the trees containing exactly zero, one, and two vertices. • In a tree T 6= 0, r (T ) denotes the root of T and x ∈ T denotes any of its vertices. A subtree is the tree composed of a vertex x and all its descendants in T . The subtrees routed at the children of x are called subtrees of x. • In a tree T , nT and eT respectively denote the numbers of vertices and leaves of T , and sT denotes the number of subtrees of r (T ). 1
Figure 1: Tree representation with binary sequences. S1 , S2 , S3 represent the subtrees of the root of T . In a computer memory a tree T can be represented as a binary sequences ST of 2n bits. In our scheme T is traversed in (left to right) preorder inserting 1 in the sequence for each vertex encountered, and inserting 0 for each move backwards; see Figure 1. ST has the recursive structure 1 S1 . . . Sk 0 , where the Si are the sequences representing the subtrees of r (T ). Since T is unordered, the order in which the Si appear in ST is immaterial (i.e., many different sequences represent T ). However a canonical form for trees will be established in Section 2 so that their sequences will be uniquely determined. If T = 0, ST is empty. If T = 1, ST = 1 0.
2
Tree enumeration
Rooted unordered trees are infinite and obviously enumerable. A canonical enumeration of trees is obtained representing all trees in canonical form (see below) and ordering their sequences for increasing values as if they were binary numbers. Trees and sequences are then numbered with increasing natural numbers. The trees are in fact ordered for increasing number of vertices and, for trees with equal number of vertices, the ordering is determined by the canonical form. The trees are grouped into families F0 , F1 , . . . as shown in Figure 2, where Fi contains the trees of i vertices. The canonical enumeration is based on the points: • For any tree T 6= 0 all the prefixes of ST , except for the whole sequence, have more 1’s than 0’s. • Since the initial character of each sequence is 1 and its length is twice the number of vertices of the tree, for two trees U, T with nU < nT we have SU < ST if the sequences are interpreted as binary numbers. So the trees of FU precede the trees of FT in the ordering. • The families F0 , F1 , F2 contain one element each (including tree 0 in F0 ) and need no ordering inside. Their trees are numbered 0, 1, 2 (so 0 is tree number 0 and 1 is tree number 1). For the families Fn>2 we have: 2
Figure 2: The canonical families of trees F0 to F6 and the corresponding tree enumeration.
3
1. Two trees obtained from one another by changing the order of the subtrees of any vertex are in fact the same tree and appear once in Fn . 2. The ordering of the trees in Fn is based on the ordering of the preceding families. Consider the multisets of positive integers whose sum is n − 1. E.g., for n = 6 these multisets are: 1,1,1,1,1 - 1,1,1,2 - 1,1,3 - 1,2,2 1,4 - 2,3 - 5 ordered for increasing value of the digits left to right. Each multiset corresponds to a group of consecutive trees in Fn , where the digits in the multiset indicate the number of vertices of the subtrees of the root. For F6 in Figure 2, multiset 1,1,1,1,1 refers to tree 18; multiset 1,1,1,2 refers to tree 19; multiset 1,1,3 refers to trees 20 and 21 that have the two trees of F3 as third subtree, following the ordering in F3 ; . . . ; multiset 2,3 refers to trees 27 and 28; the last multiset 5 refers to trees 29 to 37 whose roots have only one child. 3. As a consequence of point 3 the ordering of the trees in Fn starts with the one of height 2 with n − 1 subtrees of the root of one vertex each and sequence 1 1 0 1 0 1 0 . . . 1 0 0; and ends with the “chain” of n vertices and sequence 1 1 . . . 1 0 0 . . . 0. Due to the ordering chosen, the binary sequences representing the trees in Fn are automatically ordered for increasing values, see Figure 3. Let fn denote the number of trees in Fn . Actually many of these trees (not necessarily all) can be generated from the ones in Fn−1 using the following: Doubling Rule DR. From each tree T in Fn−1 build two trees T1 , T2 in Fn by adding a new vertex as the leftmost child of r (T ), or adding a new root and appending T to it as a unique subtree. Then we immediately have: fn ≥ 2n−2 for n ≥ 2. For example the four trees of F4 in Figure 2 can be built by the DR rule from the two trees of F3 . The nine trees of F5 can be built by DR from the four trees of F4 , with the exception of tree 13. The twenty trees of F6 can be built by DR from the nine trees of F5 , with the exception of trees 27 and 28. So although the inequality (1) certainly holds for fn , e.g. f4 = 4 = 22 , f5 = 9 > 23 , f6 = 20 > 24 , the number of extra trees that cannot be built with DR increases sharply with n. By direct computation we can see, for example, that f11 = 2290 > 211 and in general fn > 2n for n → ∞. A stronger bound can be derived with the following observation. Starting from the value f11 > 211 and applying the DR rule we have fn > 2n for 12 ≤ n ≤ 20. For f21 we have that 2 · f20 > 221 trees of F21 can be generated by DR and eight groups of other trees can be built as the ones with exactly two subtrees of the root having i and j vertices respectively, with 2 ≤ i ≤ 9, 18 ≥ j ≥ 11, i + j = 20 (i.e., 2 and 18, 3 and 17, . . . , 9 and 11, vertices: note that these trees cannot be generated by DR). Each of these groups contains fi fj > 2i−2 2j = 218 trees obtained combining the trees in Fi with the ones in Fj , for a total of > 221 trees. In total 4
0 1 2
10 1100
3 4
110100 111000
5 6 7 8
11010100 11011000 11101000 11110000
9 10 10 10 10 10 10 10 17
1101010100 1101011000 1101011000 1101011000 1101011000 1101011000 1101011000 1101011000 1111100000
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
110101010100 110101011000 110101101000 110101110000 110110011000 110110101000 110110110000 110111010000 110111100000 111001101000 111001110000 111010101000 111010110000 111011010000 111011100000 111100110000 111101010000 111101100000 111110100000 111111000000
Figure 3: The binary sequences representing the trees of the first seven canonical families.
5
we have f21 > 2 · 221 = 222 . This bound fn > 2n+1 holds up to n = 30 applying DR, then for f31 the same reasoning applies, that is we take the 2 · f30 > 232 trees of DR and the ones of the eight groups of subtrees with i, j vertices with 2 ≤ i ≤ 9, 28 ≥ j ≥ 21, i + j = 30. Each of these groups contains > 229 trees, hence we have f31 > 233 . The count is repeated for n increasing by tens and we have: 11
Proposition 1. fn > 2d 10 ne−2 for n ≥ 11. So for 11 ≤ i ≤ 20 we have fi > 2i ; for 21 ≤ i ≤ 30 we have fi > 2i+1 ; for 31 ≤ i ≤ 40 we have fi > 2i+2 ; etc. Note that the exponent of 2 grows faster than the index of fn but this growth is still much slower than it would be if all the possible trees of Fn were considered (that is, not only eight significant groups of them). As far as an upper bound for fn it can be immediately seen that for n ≥ 2 all the binary sequences representing our trees begin with two 1’s and end with two 0’s (see Figure 3). Then these four digits could be removed since the remaining 2n − 4 bits are sufficient to represent a tree, and we have immediately fn ≥ 22n−4 . However this bound is obviously too high since a large number of sequences of 2n − 4 bits do not represent a set of subtrees of the root. At least we can consider only sequences that contain an equal number of 1’s and 0’s (the effect of other restrictions are more m difficult to evaluate). For m = 2n − 4 such sequences are m/2 ≤ 2m−1 for m ≥ 2, then we have: Proposition 2. fn ≤ 22n−5 for n ≥ 3. It is also interesting S to count the total number of trees gn with up to n vertices, that is the trees in ni=1 Fi . Since the given bounds for fn grow with n as powers of 2, gn roughly grows twice as fast accordingly. Finding strong lower and upper bounds for fn and gn is a challenging problem. We pose: Open problem 1. Express fn and gn exactly as a function of n. Proving the optimality of tree representation in form of a binary sequence crucially depends on the solution of Open Problem 1. With the rough lower bound for 11 ne − 2 bits are necessary to fn given in Proposition 1 we have that log2 fn > d 10 represent all trees of n vertices. Then the sequences of 2n bits built as suggested in this work are slightly less than twice as long as indicated by this bound. On the other hand from Proposition 2 we have log2 fn < 2n − 5, that is the gap is still very high. An arbitrary tree T can be transformed into canonical form with Algorithm CF sketched in Figure 4. The simplified analysis given in the following proposition shows that the algorithm requires polynomial time. Most probably a better algorithm can be devised, however our present aim is just showing that the problem is computationally “easy”. We have: Proposition 3. The problem of transforming an arbitrary tree in canonical form is in P. 6
algorithm CF(T, n) 1. forany vertex x ∈ T count the number of vertices n1 , . . . nk of its subtrees; reorder these subtrees for non decreasing values of the ni ; let G1 , . . . , Gr be the groups of subtrees with the same number g1 , . . . , gr of vertices, with all gi > 2; // reordering is necessary but not sufficient for having T in canonical form // the trees in all Gi must be be arranged in canonical order 2. forany x ∈ T , down-top from the vertices closest to the leaves forany group Gi = {T1 , . . . , Ts } compute the representing sequences S1 , . . . , Ss ; order S1 , . . . , Ss for increasing binary value; permute T1 , . . . , Ts accordingly. Figure 4: Structure of Algorithm CF for transforming an arbitrary tree T of n vertices in canonical form. CF requires polynomial time in n. Proof. An easy analysis of Algorithm CF shows that the algorithm is correct and each of its steps 1, 2 can be executed in total time O(n2 ).
3
The three operators
We introduce the operations of addition (symbol +), addition-plus (symbol ⊕), and multiplication (symbol ·, or simple concatenation) defined as follows. Referring to Figure 5, let A, B be two arbitrary trees: • Addition. T = A + B is built by merging the two roots r (A), r (B) into a new root r (T ), for A, B 6= 0 (addition with 0 is not defined.). That is the subtrees of A and B (if any) become the subtrees of r (T ). We have A + 1 = 1 + A = A. • Addition-plus. T = A ⊕ B is built by creating a new root r (T ) connected to r (A) and r (B). That is A and B become the subtrees of r (T ). We have A ⊕ 0 = 0 ⊕ A 6= A and A ⊕ 1 = 1 ⊕ A 6= A. • Multiplication. T = A · B is built by merging r (B) with any vertex x ∈ A so that all the subtrees of r (B) become new subtrees of x. We have A · 0 = 0 · A = 0 with some abuse of the definition of multiplication in the second term since 0 has no vertex with which the vertices of A can be merged. We also have A · 1 = 1 · A = A.
7
Figure 5: Examples of addition, addition-plus, and multiplication. Note that in all cases is immaterial in which order the subtrees are attached to the new roots. These definitions bear the consequences given below. In the notation multiplication has precedence over the two additions for saving parentheses. Proposition 4. For T = A + B we have nT = nA + nB − 1. For T = A ⊕ B we have nT = nA + nB + 1. For T = A · B we have nT = nA nB . Proof. Immediate from the definition of the three operations. Proposition 5. Addition is commutative and associative. For A, B, C 6= 0 we have: (i) A + B = B + A; (ii) (A + B) + C = A + (B + C). Proof. Immediate from the definition of addition. Proposition 6. Addition-plus is commutative and generally not associative: (i) A ⊕ B = B ⊕ A for all A, B; (ii) (A ⊕ B) ⊕ C = A ⊕ (B ⊕ C) if and only if A = C. Proof. Commutativity (point (i)). Immediate from the definition of addition-plus. Associativity (point (ii)). If part: in the two trees T = (A ⊕ B) ⊕ C and U = A ⊕ (B ⊕ C) the subtrees of the root are T1 = A ⊕ B, T2 = C, and U1 = A, U2 = B ⊕ C, respectively. If A = C we have T1 = U2 (due to commutativity) and T2 = U1 . Only if part: T = U implies T1 = U1 or T1 = U2 , where both cases are impossible for A 6= C. 8
Now for a positive integer k > 1 and a tree A we can define the product kA and the product-plus k + A (not to be confused with the product of trees) as the sum, or the sum-plus, of k copies of A. In fact, due to point (ii) of Propositions 5 and 6, the k copies of A can be combined in any order. Letting T = kA and U = k + A we have nT = k nA − k + 1 and nU = k nA + k − 1. However, independently of k, the trees of nT or nU vertices obtained as a product kA or k + A are only fnA , that is they constitute an exponentially small fraction of all the trees in FnT or FnU . For example the “even” trees (k = 2) under addition or addition plus are a small minority among all the trees with the same number of vertices. Proposition 7. Multiplication is associative, that is (A · B) · C = A · (B · C) for all A, B, C. Proof. Follows from the definition of multiplication with simple reasoning. For a positive integer k > 1 and a tree A we can define the power Ak as the product of A by itself k − 1 times. Due to Propositions 7 the multiplications can be done in any order. For T = Ak we have nT = nkA . Multiplication is not commutative but we are unable to give a general if and only if condition. For a product A · B we consider two cases, namely nA = nB and, w.l.o.g. nA > nB . Recalling that, for any tree X, ex and sx respectively denote the number of leaves of X and the number of subtrees of r (X), we have: Proposition 8. For nA = nB multiplication is commutative, that is A · B = B · A, if and only if A = B. Proof. The if part is immediate. For the only if part let T = A · B and U = B · A. From the construction of the two products we immediately have eT = nA eB and eU = nB eA . If T = U we have eT = eU then nA eB = nB eA , then eA = eB since nA = nB . Note that T and U contain eA = eB subtrees rooted in the former leaves of A and B respectively, each coinciding with B and A respectively. Each of these subtrees contains nB = nA vertices, while all the other subtrees of T, U contain a different number of vertices. Then for having T = U the former two groups of subtrees should be identical, that is each subtree coinciding with B in T must be equal to a subtree coinciding with A in U . That is A = B. For nA > nB we merely present three strict necessary conditions, namely: Proposition 9. For nA > nB multiplication is commutative, that is A · B = B · A, only if the following conditions are all verified: (i) na /eA = nB /eB ; (ii) B is a proper subtree of A; (iii) if sA ≥ sB all the subtrees of r (B) must be equal to some subtrees of r (A).
9
Figure 6: An example of commutative product A · B = B · A for B subtree of A. The two trees A and B are shown in solid and dashed lines, respectively. Z is A · B in canonical form. In this case A = B 2 hence A · B = B 3 . Proof. Let T = A · B and U = B · A. Condition (i). Immediate from the observation that T = U implies eT = eU (see the proof of Proposition 8). Condition (ii). As in the proof of Proposition 8, consider the subtrees of T, U respectively attached to the former leaves of A in T and of B in U . Since nA eB = nB eA (see the proof above) and nA > nB we have eA > eB . In T there are eA such subtrees of nB vertices and in U there are eB such subtrees of nA vertices. For having T = U the above subtrees of T (all coinciding with B) should be present also in U where, by the construction of B · A, they must appear as subtrees of the copies of A in U . Condition (iii). By construction the sB subtrees of r (B) appear also in T as subtrees of r (T ) where they are the ones with fewer vertices because all the others have at least nB vertices. And the sA subtrees of r (A) appear also in U as subtrees of r (U ) where they are the ones with fewer vertices because all the others have at least nA vertices. Note that all these other subtrees of r (U ) have more vertices than the subtrees of r (B) since nA > nB . For having T = U the sB subtrees of r (B) that appear as subtrees of r (T ) must be equal to sB subtrees of r (U ) and, for what just seen about these subtrees, they must be equal to sB subtrees among the ones with fewer vertices, i.e. with subtrees of r (A). This also implies that if sA = sB then A = B. An example with A · B = B · A is shown in Figure 6 where the three conditions of Proposition 9 are verified. Z is the product tree in canonical form. In this particular case we have A = B 2 hence A · B = B 3 . Finally from Proposition 4 we immediately see that multiplication is generally not distributive over addition and addition-plus. For A, B, C 6= 0, 1 we have: 10
(A + B) · C = 6 A · C + B · C, (A ⊕ B) · C = 6 A · C ⊕ B · C.
4
Generating all trees
All the rooted unordered trees can be generated by the single generator 0 using addition and addition-plus.1 Namely: • The empty tree 0 is the generator of itself. • Tree 1 is generated as 0 ⊕ 0. Tree 2 is generated as 1 ⊕ 0. • Assuming inductively that each of the trees in Fi with 1 ≤ i ≤ n − 1 can be generated by the trees of the preceding families, then each tree T in Fn can also be generated. In fact if r (T ) has one subtree T1 then T can be generated as 0 ⊕ T1 ; if r (T ) has k ≥ 2 subtrees T1 , T2 , . . . , Tk then T can be generated as U + V where U is T deprived of Tk and V is T deprived of T1 , T2 , . . . , Tk−1 .
5
Prime trees
The first systematic treatment of primality is found in Euclid’s Elements where the integers greater than 1 that can be constructed by multiplication only if the two factors are 1 and themselves are called πρωτ oi αρiθµoi, literally “prime numbers”. The basic arithmetic operations with integers are addition and multiplication, with x + 0 = x and x · 1 = x. Then prime numbers have no sense for addition since all x greater than 1 can be constructed as the sum of two smaller terms other than 0 and x. In our arithmetic for trees, instead, primality occurs in relation to all the operations defined. In this whole section we refer to trees T with nT > 1. We pose: Definition 1. (i) T is prime under addition (shortly add-prime) if can be generated by addition only if the two terms are 1 and T (here tree 1 has the role of integer 0 in IN ). (ii) T is prime under addition-plus (shortly plus-prime) if cannot be generated by addition-plus of any pair of trees. (iii) T is prime under multiplication (shortly mult-prime) if can be generated by multiplication only if the two factors are 1 and T . 1
Two types of addition have been included in the operation set to allow the construction of all the trees starting from a finite set of generators. The reader may check that addition and multiplication, or addition-plus and multiplication, are not sufficient for this purpose. This is a basic difference with the arithmetic on IN .
11
Mult-primality is the natural counterpart of primality in IN . As it may be expected its consequences are not easy to study. For add-primality and plus-primality the situation is instead quite simple. We have: Proposition 10. T is add-prime if and only if r (T ) has only one subtree. Proof. By contradiction. If part: for an arbitrary tree X = A + B with A, B 6= 1, r (X) has at least two subtrees, then T 6= X for any pair A, B 6= 1. Only if part: if r (T ) has k > 1 subtrees T1 , . . . , Tk then T = U + V where U is equal to T deprived of Tk and V is equal to T deprived of T1 , . . . , Tk−1 . Proposition 11. T is plus-prime if and only if r (T ) has more than two subtrees. Proof. By contradiction. If part: for an arbitrary tree X = A ⊕ B, r (X) has at most two subtrees, then T 6= X for any pair A, B. Only if part: if r (T ) has one subtree T0 then T can be built as T0 ⊕ 0. if r (T ) has two subtrees T1 , T2 then T can be built as T1 ⊕ T2 . As a consequence of Propositions 10 and 11, deciding if a tree is add-prime or plus-prime is computationally “easy” (in fact if the trees are accessed from the root the decision is taken in constant time). Note that no tree can be contemporarily add-prime and plus-prime, and that all trees whose roots have two subtrees are not add-prime nor plus-prime. We also have: Corollary 1. For n ≥ 4 the number of add-prime trees is fn−1 , and the number of P n2 −1 fi fn−i−1 for n even and plus-prime trees is ≥ fn − fn−1 − Kn , where Kn = i=1 P n−1 2 equality holds, and Kn = i=1 fi fn−i−1 for n odd and inequality holds. Proof. The bound for add-prime trees is immediate from Proposition 10 and the construction in the DR rule. The bound for plus-prime trees is found by eliminating from Fn the fn−1 trees T with one subtree of the root, plus the trees with two subtrees of the root. For counting the latter trees note that their subtrees are the combination of the fi trees with i vertices with the fn−i−1 trees of n − i − 1 vertices and that, for n odd, the product f n−1 f n−1 includes some repetitions due to the combination 2 2 of equal trees in F n−1 . 2
The figures in Corollary 1 can be converted as functions of n by the rough estimates of Proposition 1, although a precise computation of fn would give a more significant ratio for the number of add-prime and plus-prime trees over all the trees of their family. Still, our rough estimates are sufficient to see that prime trees remain a sizable share of the total. For mult-primality we start with an easy statement derived immediately form Proposition 4: Proposition 12. If n is a prime number all the trees with n vertices are mult-prime. 12
The converse of Proposition 12 does not hold in our arithmetic, that is if n is composite a tree of n vertices may still be mult-prime. In a sense prime trees are more numerous than primes in IN . In fact what is rarely found is mult-composite (i.e. non mult-prime) trees. For example with some reasoning one can see that among the twenty trees of F6 (see Figure 2) only trees 20, 22, 24, and 28 are composite, as they can be built as 2 · 3, 3 · 2, 4 · 2, and 2 · 4, respectively. If n is a composite number, the mult-composite trees with n vertices are exactly the ones obtained as the distinct products A·B and B·A between trees A of a vertices and trees B of b vertices, in all possible combinations between them and for all the possible factorizations of n in two factors a, b. This shows a connection between mult-composite trees and integer factorization in IN . Note that tree multiplication is generally not commutative then both products A·B and B ·A must be considered; however the trees thus generated may be not all distinct. Since if nT is prime T is mult-prime, and the problem of deciding if nT is prime is polynomial in log nT (i.e., in the size of the representation of nT ) [1], deciding if T is mult-prime is straightforward for nT prime. However the problem is difficult for nT composite because T may be mult-prime or mult-composite. An algorithm for nT composite may consist of building all the products A·B and B ·A and comparing T with them looking for a match. However this method is impracticable unless nT is very small because all the trees of a and b vertices should be used for all the factorizations a · b of nT . Before proposing a better algorithm let us discuss some particular conditions that can be easily verified. Proposition 13. For any tree T we have: (i) if r (T ) has only one subtree, T is mult-prime; (ii) if r (T ) has two subtrees with n1 ≤ n2 vertices and n1 + 1 does not divide n2 , T is mult-prime. Proof. Case (i). By the definition of multiplication the root of a product tree must have at least two subtrees if the two factors have more than one vertex. Case (ii). Let T1 , T2 be the subtrees of r (T ), with n1 , n2 vertices respectively. Assuming by contradiction that T = A · B with A, B 6= 1, B should coincide with T deprived of T2 since n1 ≤ n2 . Then nB = n1 + 1 and B should be merged in T with all the vertices of A thus implying n2 = (nA − 1)(n1 + 1), against the hypothesis on n1 , n2 posed in the condition. If the conditions of Propositions 13 do not hold, T can be mult-prime or multcomposite. Referring to Figure 2, all the trees from 29 to 37 are mult-prime (case (i)). Tree 6 has n1 + 1 = 2 and n2 = 2 then could be mult-composite (case (ii)) as in fact is since can be obtained as 2 · 2. Tree 27 has n1 + 1 = 3 and n2 = 3 then it could be mult-composite (case (ii)) but is mult-prime. If r (T ) has more than two subtrees the situation is more difficult. First consider a relevant property of product trees based on the observation that, if T = A · B, all the subtrees of r (B) are also subtrees of r (T ). Namely: 13
Proposition 14. Let T = A · B with A, B 6= 0 and A, B 6= 1, and let Y be a subtree of r (B) with maximum number nY of vertices. Then the subtrees of r (B) are exactly the subtrees of r (T ) with at most nY vertices. Proof. Since T = A · B, the subtree Y has been inserted at r (T ) as the largest subtree of r (B). Then also the subtrees of r (T ) with at most nY vertices must have been inserted at r (T ) as subtrees of r (B) since they have too few vertices for deriving from former subtrees of r (A) whose vertices are merged with B in T . Furthermore the remaining subtrees of r (T ) cannot be subtrees of r (B) since they have too many vertices by the hypothesis that Y is a largest subtree of r (B). In the mult-composite tree Z of Figure 6, if the first subtree of r (Z) (containing one vertex) is a subtree of maximal cardinallity of one of the factors, B in this case, then B consists of a root plus the first two subtrees of r (Z). Similarly, if the third subtree of r (Z) is a subtree of maximal cardinality of one of the factors, A in this case, then A consists of a root plus the first four subtrees of r (Z). We now pose: Notation 1. For an arbitrary tree T : G1 , . . . , Gr are the groups of subtrees of r (T ) with the same number g1 , . . . , gr of vertices, g1 < g2 < · · · < gr ; S Hi = ij=1 Gj , 1 ≤ i ≤ r, i.e. each Hi is the group of subtrees of r (T ) with up to gi vertices. According to this notation, Proposition 14 says that the subtrees of r (B) are exactly the ones of r (T )contained in Hi for a certain value i, and easily implies i < k because, for i = k, T would coincide with B. We can now prove a primality condition for trees whose root has more than two subtrees. Proposition 15. For any tree T whose root r (T ) has k > 2 subtrees T1 , . . . , Tk , let nT1 ≤ nT2 ≤ · · · ≤ nTk (if not, the subtrees can be reordered in polynomial time). And let T1 , . . . , Tk be grouped in H1 , . . . Hr as in notation 1, with hi denoting the total number of vertices of the trees in Hi . If, for all j < r, hj + 1 does not divide at least one of the nTi of the subtrees in Hr − Hj , then T is mult-prime. Proof. An immediate consequence of Proposition 14 noting that, for T = A · B, nB = hj + 1 for a certain value j must divide the cardinality of all the subtrees of root r (T ) that have not been inserted in T as subtrees of r (B). If the condition of Proposition 15 does not hold, T can be mult-prime or multcomposite. Tree 19 of Figure 2 has n1 = n2 = n3 = 1, n4 = 2. Since h1 + 1 = n1 + n2 + n3 + 1 = 4 does not divide n4 the tree is mult-prime. Both trees 20 and 21 have n1 = n2 = 1, n3 = 3, then h1 + 1 = 3 divides n4 . Here 20 is mult-composite and 21 is mult-prime. Propositions 13 and 15 pose strong bounds on the number of vertices of the subtrees of the root of a mult-composite tree, a further indication that mult-prime 14
algorithm MP(T , n) 1. CF(T , n); // transform T in canonical form with Algorithm CF of Figure 4 2. let H1 , . . . , Hr be the groups of subtrees of r (T ) as in notation 1; 3. for 1 ≤ i ≤ r − 1 copy T into Z; traverse Z in preorder forany vertex x encountered in the traversal if x has all the subtrees of Hi erase these subtrees in Z else exit from the i-th cycle; return MULT-COMPOSITE; 4. return MULT-PRIME. Figure 7: Structure of Algorithm MP for deciding if a tree T of n vertices is mult-prime. MP requires polynomial time in n.
trees are very numerous. However if nT is composite and the conditions of the two propositions do not hold we must find a different way to decide mult-primality. This is done with Algorithm MP sketched in Figure 7 that requires polynomial time. Since all trees with a prime number n of vertices are mult-prime, MP is intended for testing trees with n composite. However MP works for all trees and can always be applied to avoid a preliminary test for the primality of n. Formally we have: Proposition 16. The problem of deciding if an arbitrary tree is mult-prime is in P. Proof. Refer to Algorithm MP. Correctness. Only step 3 requires an analysis. Z is the changing version of T and is restored at each i-th cycle. If one of the groups Hi of subtrees can be erased from Z at all vertices encountered in the traversal, the cycle is completed and the algorithm terminates declaring that T is mult-composite. In fact tree B, whose root has the subtrees in Hi , is one of the factors of T (see Proposition 14). If none of the i-cycles can be completed, that is no Hi can be found as being the group of subtrees of x in all vertices x of Z, the tree T is mult-prime as declared in step 4. Complexity. A superficial analysis of the algorithm is the following. Step 1 requires O(n2 ) time (Proposition 3). Step 2 is executed with a linear time scan because the tree is now in canonical form and the number of vertices in each subtree of the root has been computed by algorithm CF in step 1. Step 3 requires O(n) copy operations of T into Z in O(n2 ) time, and O(n) traversals each composed of O(n) steps, for a total of O(n2 ) steps. At each step at vertex x the subtrees in Hi must be compared with the subtrees of x with the same cardinality; this can be done by representing 15
such subtrees with their binary sequences S and comparing these sequences. In the worst case vertex x has O(n) subtrees of length O(n), so that building and comparing all the sequences takes time O(n2 ), and the total time required by step 3 is O(n4 ). Note that this analysis is very rough because the number of vertices of T decreases during the traversal, so the stated bound O(n4 ) is exceedingly high. Note that if T is mult-composite Algorithm MP allows to find a pair of factors A, B at no extra cost. In fact, if a cycle i of step 3 is completed, the algorithm is interrupted on the return statement and the group Hi contains exactly the subtrees of r (B), while the tree Z is reduced to A. This also implies that n is factorized in time polynomial in n, in agreement with the factorization in ordinary arithmetic that requires time exponential in log n. Add-prime and plus-prime trees are neatly characterized in Propositions 10 and 11. Counting their number in function of fn is straightforward and a rough estimate has been given in Corollary 1. An even approximate count for mult-prime trees is much more difficult. We pose: Open problem 2. For a given n, determine the number of mult-prime trees of n vertices.
6
Possible applications and extensions
Quoting a seminal book of Donald Knuth [3], trees are “the most important nonlinear structures arising in computer algorithms”. So, while the present study was essentially motivated by the curiosity of defining arithmetic concepts outside the realm of numbers, a possible interest in applications should be considered. Essentially all trees used in computer algorithms are rooted, and many special families have been defined among them (e.g. ordered, binary, etc) to deal with particular problems. In this study we do not put any such a restriction. A rooted unordered tree corresponds to a nested set, that is (quoting [3] again) “a collection of sets in which any pair of sets is either disjoint or one contains the other”. For a more familiar concept one such a tree corresponds to a hierarchical structure like the Linnaeus taxonomy of plants, animals, and minerals, or the Dewey tree classification in a library. So our arithmetic might be useful for working of structures of this sort, for example hierarchical databases in computer applications, or phylogenetic trees in biology. Let us discuss what the role of our operations may be. Consider an example. A School of Science can be organized in tree form, from the school itself at the root to different academic fields like Biology, Mathematics, Physics, etc., each divided in subfields like Algebra, Calculus, Geometry for the field of Math, etc. Note that at any vertex the subtrees are essentially unordered (not to disappoint the sensibilities of some colleagues) although in practice they are presented in the standard canonical form of alphabetic order. A School of Engineering may have a similar tree structure. The two schools can be merged 16
into a unique School of Engineering and Science with our operation of addition, or joined with addition-plus to become subtrees of a higher new root. The operation of multiplication, in this example, may correspond to the decision of adding a dummy substructure to any vertex, ready to be filled with new details (again the new substructure is the same for all vertices for susceptibility reasons). The above example suggest the introduction of new operations to face particular needs. Addition and addition-plus executed at a vertex different from the roots of the addends, and multiplication restricted to a subtree (as a limit, to a leaf) of the first factor, have a clear meaning in applications. Extension of our operations to these cases is immediate since they are done on subtrees instead of the whole tree. Propositions 5, 6, and 7 hold for the subtrees where the operations are performed, although their effect on the whole tree should be investigated. This is not done here, as the present work is merely aimed at proposing a new line of studies on trees. We also do not see at the moment a practical interest for the concept of primality, if not for the reverse-engineering operation of deciding if a tree has been generated as a product. Finally note that, even though multiplication can find fewer applications than addition and addition-plus, it may be very useful in data compression since the information contained in a product A · B is fully present in its factors. This allows reducing the storage space of the product from Θ(nA · nB ) to Θ(nA + nB ).
7
Other studies on tree arithmetic
Up to now two lines of research have been directed to defining arithmetic on trees. The first, opened by J.L. Loday, A. Frabetti, F. Chapoton, and F. Goichot. in connection with dendriform algebras [4], was then developed by J.L. Loday himself who gave a full description of arithmetic operations on binary trees and their properties, showing an embedding of IN in the subsets of all binary trees of n vertices [5]. A. Bruno and D. Yasaki worked on Loday’s theory introducing primality and counting properties on subsets of trees in [2]. The theory of Loday, Bruno, and Yasaki (LBY) is the first and essentially unique effort of extending arithmetic from natural numbers to trees and is very clearly presented in [2]. However, aside from proceeding with similar purposes, our arithmetic does not share any definition or result with LBY. First of all LBY is limited to the special family of binary (ordered) trees whose internal vertices including the root have one parent and two children (in computer science terms, the root can be seen as being attached to a dummy parent outside the tree). This carries simpler consequences than in our general case. A noncommutative addition is defined attaching the second addend B to an extreme leaf of the first addend A. As a matter of fact this operation is given in two versions (extreme left or right leaf of A) to express any tree by addition from one generator. From this construction stems a definition of tree multiplication that produces trees
17
different from our products. A partial ordering on trees and several other interesting properties are derived, including some counting arguments on the different families of trees built. The most relevant extension done by Bruno and Yasaki over Loday’s theory has been the definition and treatment of prime trees (under multiplication), including a theorem that shows a nice form assumed by composite trees. In fact none of the results of LBY applies to our theory, or vice-versa. The second line of research due to R. Sainudiin is aimed at using binary trees for treating mapped partitions of a special class of intervals [6]. The problem addressed and the construction of the trees needed has nothing to share with LBY and with our theory. It is also worth noting that none of the preceding works has considered aspect of computational complexity related to the operations on trees.
References [1] M.Agrawal, N. Kayal, and N. Saxena. PRIMES Is in P. Annals of Mathematics, Second Series, 160 (2) (2004) 781-793. [2] A. Bruno, D. Yasaki. The arithmetic of trees. Involve 4 (1) (2011) 1-11. [3] D.E. Knuth. The Art of Computer Programming. Vol. 1, Sec. 2.3: Trees. AddisonWesley Publ. Co., Reading MA (1968). [4] J.L. Loday, A. Frabetti, F. Chapoton, and F. Goichot. Dialgebras and related operands. Lecture Notes in Mathematics 1763, Springer-Verlag, Berlin (2001). [5] J.L. Loday. Arithmetree. J. Algebra 258 (1) (2002) 275309, [6] R. Sainudiin. Algebra and Arithmetic of Plane Binary Trees: Theory & Applications of Mapped Regular Pavings. http://www.math.canterbury.ac.nz / r.sainudiin/talks/MRP UCPrimer2014.pdf (2014).
18