EDGE EXCHANGEABLE MODELS FOR NETWORK DATA 1 ...

Report 2 Downloads 46 Views
EDGE EXCHANGEABLE MODELS FOR NETWORK DATA

arXiv:1603.04571v1 [math.ST] 15 Mar 2016

HARRY CRANE AND WALTER DEMPSEY Abstract. Exchangeable models for vertex labeled graphs cannot replicate the large sample behaviors of sparsity and power law degree distributions observed in many network datasets. Out of this mathematical impossibility emerges the question of how network data can be modeled in a way that reflects known empirical behaviors and respects basic statistical principles. We address this question by observing that edges, not vertices, act as the statistical units in most network datasets, making a theory of edge labeled networks more natural for most applications. Within this context we introduce the new invariance principle of edge exchangeability, which unlike its vertex exchangeable counterpart can produce networks with sparse and/or power law structure. We characterize the class of all edge exchangeable network models and identify a particular two parameter family of models with suitable theoretical properties for statistical inference. We discuss issues of estimation from edge exchangeable models and compare our approach to other attempts at the above question.

1. Introduction Network datasets emerging from a wide range of real world processes, such as email communications [15], professional collaborations [25], and social relationships [20, 27], exhibit common structural features, namely sparsity and power law degree distribution [1, 4, 11, 14, 17]. The challenge of explaining these observed empirical behaviors commands attention, and sparks controversy, within the network science literature. Discrepant explanations of the power law property underlie some of the most pressing questions in network science: Barab´asi and Albert [4] offer the so-called preferential attachment model as an explanation for generating dynamics of scale-free networks, but Lee, et al [18] and Willinger, et al [26] show that power law degree distributions in observed networks may reflect the sampling mechanism rather than any meaningful structure in the actual network. Proponents of the preferential attachment model [4, 7] cite not only its power law behavior but also its growth dynamics as points in its favor, but, for reasons we lay out below, the preferential attachment model mishandles critical elements of network formation that are crucial to statistical analysis. Specifically, the growth dynamics are defined in terms of a mechanism Date: August 1, 2015. H. Crane is partially supported by NSF grant DMS-1308899 and NSA grant H98230-13-10299. 1

2

HARRY CRANE AND WALTER DEMPSEY

for repeated vertex addition when many network datasets evolve by a process of edge growth. Preferential attachment models also lack invariance properties necessary to ensure that inferences are unaffected by arbitrary choices made during data analysis, such as label assignments and sampling design. As we discuss in other work [10], the field of statistical network analysis is hamstrung by the lack of an inferential framework that both admits sound models for network formation with observed empirical properties and facilitates statistical inferences in the way of estimation, testing, and prediction. These objectives of network analysis reflect two fundamental principles of statistical modeling: (I) “There should be consistency with known limiting behaviour.” [8, p. 5] (II) “The sense of the model and the meaning of the parameter [...] may not be affected by accidental or capricious choices such as sample size or experimental design.” [21, p. 1237] Within network studies specifically, Principle (I) speaks to the persistent asymptotic properties of sparsity and power law degree distributions, while Principle (II) recalls the logical need for invariance with respect to arbitrary assignment of labels and the choice of sampling mechanism during data acquisition. As is well known throughout the statistics and probability community, for example, [3, 19], Principles (I) and (II) are not compatible under the conventional representation of network data by a graph with labeled vertices, as in Figure 1(b), and the corresponding notion of exchangeability with respect to vertex relabeling. Assuming a population of infinitely many vertices: An exchangeable random graph is sparse if and only if it is empty, that is, has no edges, with probability 1. The exchangeability/sparsity conundrum has even caught on recently in the machine learning literature, where Orbanz and Roy [22, p. 459] call attention to the following burning question: Is there a notion of probabilistic symmetry whose ergodic measures [...] describe useful statistical models for sparse graphs with network properties? The above observation makes plain a fundamental challenge to statistical network modeling. That is, an exchangeable generating model, in the traditional sense, cannot incorporate crucial logical properties in a way that respects known empirical behavior. A common approach to circumvent this issue is to somehow ignore Principle (II). Perhaps the best known example is found in the work of Bickel and Chen [5], who sacrifice Principle (II) for a model which is not consistent under subsampling but in some sense describes a sparse sequence of finite graphs. Bickel and Chen go on to prove consistency of certain parameter estimates from their model, but, without

EDGE EXCHANGEABLE MODELS FOR NETWORK DATA

3

a corresponding data generating process for the population network, the relationships between parameters for different sample sizes, and therefore the meaning of consistency in the context of finite sample inference, is hard to divine. We discuss this approach further in Section 6.1. Rather than finagle the conventional notion of exchangeability, we might ask whether exchangeability with respect to vertex relabeling is appropriate for many network datasets. As a heuristic explanation for why ordinary vertex exchangeability is untenable for most network datasets, we observe that many network datasets do not arise by observing all existing interactions among a sample of individuals but rather come about by a process of first observing interactions, that is, edges. The vertices involved in those interactions are a byproduct of the sampling scheme. From this point of view, the sampled vertices are not representative of the larger population of individuals—the observed vertices are those incident to the observed edges and, thus, tend to have higher than average degree—explaining why vertices cannot be labeled exchangeably in most cases. As an easy illustration, we point out that there are no known network datasets without any edges at all and, yet, within any sparse network dataset there are many subsets of vertices whose induced subnetworks are empty. The assumption of a population network with exchangeable vertex labels is equally as untenable as a matter of common sense as it is of mathematics! More recently, Caron and Fox [6] propose a class of exchangeable point process models for random graphs using the theory of completely random measures. In this case, exchangeability refers to the location of points in space, that is, measure preserving transformations of the plane. A graph is associated to such a point process by assuming that the points in space correspond to edges in a graph. As defined, however, it is not clear that the induced finite-dimensional distributions in [6] are consistent with respect to any realistic sampling scheme. In short, the generating mechanism has an invariance property and is able to produce sparse graphs, but it is not clear how the invariance property relates to the network data. Moreover, it is unclear how such a model can be fit directly to network data. We discuss this approach in more detail in Section 6.2. In our prior general discussion of network data [10], we call attention to the fact that not all network datasets can be modeled in the same way, even if the resulting network data exhibit some of the same structural features. Our discussion below centers on a specific type of network data in which the network grows by a process of interactions in the population. As we mention, these dynamics are the most prevalent among the network datasets commonly referenced in the literature, for example, with the Enron email corpus [15] and actors collaboration network [25] as primary examples. In this setting, we build upon the notion of edge exchangeability, which we have introduced in previous work [9, 10] as a natural invariance principle for certain interaction datasets.

4

HARRY CRANE AND WALTER DEMPSEY

The two parameter process in Section 5 addresses our fundamental question directly with a generating mechanism that both models the power law behavior found in several network datasets and exhibits the probabilistic symmetry of edge exchangeability. Beyond answering the above question, we develop the broader framework of edge exchangeable network models. We characterize the class of edge exchangeable network models with a de Finetti-type representation theorem and go on to discuss various features of network data modeled by edge exchangeable networks.

2. Network data Networks applications often involve interactions among individuals in a population P. Sometimes these interactions can be represented by a (simple) graph G = (S, ES ), where S ⊂ P is a sample from P and ES ⊆ S × S is a set of (directed or undirected) edges, but in many cases the interactions involve more than two individuals and an interaction involving the same set of individuals can occur more than once. A prime example is the actors collaboration network [25], in which the population consists of all movie actors and each movie corresponds to an interaction involving the set of individuals in its cast. In general there are more than two actors in each movie, and nothing precludes a set of actors from being cast in two different movies together. Throughout we write P to denote the population under study, fin(P) to denote the set of all finite subsets of P, and I to denote a set of interactions among the individuals in P. Ordinarily the interactions are represented by a network structure that is encoded by a multiplicity, or adjacency, function M : fin(P) → N ∪{0}, where M(s) records the number of occurrences of an interaction involving only the individuals in s. The network data for a sample S ⊂ P is the domain restriction M|S : fin(S) → N ∪{0}, s 7→ M(s). Implicit in the above definition is the assumption that the network data is observed by sampling S ⊂ P and observing all interactions among the sampled individuals. This assumption is plainly false for many network datasets. For example, in the actors collaboration dataset, the network data M|S : fin(P) → N ∪{0} is determined by taking a sample of movies I and observing M(s) equal to the number of movies i ∈ I whose cast is given by s. The sampled actors are those who happen to be in the cast of at least one of the sampled movies, so that actors with more credits are more likely to be included in the sample. Moreover the resulting network data only contain information for collaborations in the sampled movies; generally, there will be actors a, a0 ∈ S among those sampled who have collaborated in a movie which was not included in the sample I and, therefore, the dataset contains no evidence of any collaboration between a and a0 . The following definition better captures the nature of network data in these applications.

EDGE EXCHANGEABLE MODELS FOR NETWORK DATA

5

Figure 1. (a) A network structure derived from some physical process. Neither vertices or edges are equipped with labels. Labels are assigned exogenously during data analysis. (b) Network data obtained from the network in Panel (a) by labeling vertices. (c) Network data obtained from the network in Panel (a) by labeling edges. Definition 2.1 (Network data). Network data is a function E : I → fin(P) associating each observed interaction i ∈ I to the subset E(i) ⊂ P of individuals involved in the interaction. Definition 2.1, though unconventional, captures a wide range of situations in which network datasets arise and foreshadows why the notion of edge exchangeability is a natural invariance principle. For example, in the actors collaboration dataset, the interactions, that is, movies, comprise the statistical units while the induced network structure together with the sampled actors are part of the observation. Similarly, the Enron email network [15] is determined by sampling email interactions I among employees P at the Enron Corporation over some period of time. In both cases, the edges, not the vertices, are the statistical units and, therefore, should rightly be the labeled entities in network data. Though our discussion encompasses the general case of Definition 2.1 in which all finite subsets are units, we often specialize to the more familiar case of binary interactions for clarity. Figure 1(a) depicts a network whose edges correspond to pairwise interactions between individuals in a population. The network shown in Figure 1(a) represents the outcome of a physical process, such as emails among employees in a company or collaborations among actors; the labeled versions in Figure 1(b) and 1(c) reflect two possible ways to represent the network data produced by the outcome in Figure 1(a). The above examples of the actors collaboration and Enron email datasets illustrate the subtle, seemingly mundane, distinction between the (real world) network in Figure 1(a) and the network data in Figures 1(b) and 1(c). 3. Network properties Though many network properties may be of interest in a given application, our discussion keys in on the prevalence of sparsity and power law degree distributions in network datasets and the mathematical impossibility of an exchangeable model for vertex labeled graphs with these properties. Importantly, sparsity and power law degree distribution are defined as

6

HARRY CRANE AND WALTER DEMPSEY

asymptotic properties of a sequence of networks (Gn )n≥1 for which the number of vertices v(Gn ) → ∞ as n → ∞. When studying network data, the sequence (Gn )n≥1 usually corresponds to a sequence of networks for finite samples of a given size so that Gm can be recovered from Gn by subsampling m of its n units, for m ≤ n. Asserting these properties for network data, therefore, implicitly assumes that the observed network is sampled from a hypothetical population of unbounded size. We stress that the definitions of sparsity and power law degree distribution are independent of any assignment of labels to network data. For logical consistency, we proceed for now without any assumed labeling, as in Figure 1(a). We define v(G) as the number of vertices, e(G) as the number of edges, and d(G) = (dk (G))k≥0 as the degree distribution of G, where dk (G) = Nk (G)/v(G) for Nk (G) equal to the number of vertices incident to exactly k edges. For example, the network data G in Figure 1(a) has v(G) = 7, e(G) = 6, and d(G) = (1/7, 3/7, 1/7, 1/7, 1/7). For any network dataset G, we define a size parameter size(G), which is most naturally associated with the sample size, that is, the number of units under consideration. The size parameter may be, for example, the number of vertices, the number of edges, or perhaps some other suitable measurement of sample size. The widely observed properties of sparsity and power law degree distribution are defined asymptotically with respect to sample size. Definition 3.1 (Sparsity and power law degree distribution). Let (Gn )n≥1 be a sequence of network data with size(Gn ) = n for each n ≥ 1. The sequence (Gn )n≥1 is sparse if e(Gn ) = o(v(Gn )2 ) as n → ∞. The sequence (Gn )n≥1 exhibits power law degree distribution if for some γ > 1 the degree distributions (d(Gn ))n≥1 satisfy dk (Gn ) ∼ L(k)k−γ as n → ∞ for all large k for some slowly varying function L(x), that is, limx→∞ L(tx)/L(x) = 1 for all t > 0. Sparsity and power law properties concern the behavior of network properties as the sample size grows, offering the interpretation of (Gn )n≥1 as subsampled network data of increasing size from a population network G. Whatever the units, the natural indexing of (Gn )n≥1 according to size(Gn ) = n = #{units in Gn } suggests labeling the units 1, 2, . . . so that Gn = G|[n] is the restriction of G to the units labeled [n] = {1, . . . , n} and the subsample [m] ⊂ [n], m ≤ n, corresponds to Gn |[m] = Gm . By Definition 3.1, a sequence (Gn )n≥1 cannot be sparse unless v(Gn ) → ∞ as n → ∞ or Gn is empty for all large n ≥ 1, and (Gn )n≥1 cannot have power law degree distribution unless v(Gn ) → ∞ as n → ∞; thus, any model for these properties must also incorporate a mechanism by which new units, as well as new vertices, are introduced. 4. Edge exchangeable network models Without exception, all network models in popular use treat vertices as the units; see, for example, [12] and [16]. Under that approach, network

EDGE EXCHANGEABLE MODELS FOR NETWORK DATA

7

Table 1. Examples of network data

Examples Facebook Political blogs US Airport UC Irvine Yeast IMDB Co-authorship Enron Karate Club Wikipedia US Power6

Growth type Edge Node X X X X1 X2,5 X X2 X3 X5 X2,3,4 X2,5 X

Catalog of common network datasets categorized according to growth by addition of edges or vertices. Superscripts indicate specific features of the network data or its generating mechanism: (1) Preferential attachment, (2) projection, (3) hypergraph, (4) multigraph, (5) finite number of vertices, and (6) fixed network (no growth).

data is assumed to arise by sampling S ⊂ P and observing the domain restriction G|S : S × S → N ∪{0}. A quick study of Table 1 shows that edge growth is a staple of most network datasets, with some featuring vertex growth in addition to edge growth. As an example, the Facebook network can be represented by a graph G = (P, E), where P is the population of Facebook users and (i, j) ∈ E indicates that i and j are ‘friends’ on their Facebook accounts. This network evolves by two mechanisms, either (i) two existing Facebook users become friends or (ii) a new individual joins Facebook, corresponding to edge and vertex growth, respectively. The US Airport network has airports as vertices and an edge between two airports if there is at least one direct flight scheduled between the two. Although the formation of new airports is rare, new connections between existing airports do arise by the introduction of new flight routes. The IMDB network is the hypergraph with edges corresponding to the set of actors in a sample of movies in the database. The US Power Grid is a physical network of power lines; it does not grow.

8

HARRY CRANE AND WALTER DEMPSEY

Figure 2. Two edge labelings of the network data from Figure 1. An edge exchangeable model assigns equal probability to both networks. Notice that vertices are not labeled and, therefore, do not play a role in the exchangeability condition.

Most of the networks in Table 1 grow purely by edge addition, so that the edges are reasonably treated as the units. The individuals involved in the observed interactions are incidental to the process under study. In other words, the fact that an individual has been observed cannot be divorced from the process of interactions. With edges as units, we propose to label edges so that a network data sequence (Gn )n≥1 has size(Gn ) = e(Gn ) = n for each n ≥ 1 and for all m < n the network Gm = Gn |[m] , that is, the network obtained from Gn by removing all edges labeled in [n] \ [m] := {m + 1, . . . , n}. With this convention in place, Principle (II) suggests the new notion of edge exchangeability. Given any permutation σ : [n] → [n], we write Gσn to denote the reassignment of edge labels by σ, that is, the edge labeled i is relabeled as σ(i) for each i = 1, . . . , n, as in Figure 2. Definition 4.1 (Edge exchangeability). A random edge labeled network Gn of size n is edge exchangeable if Gn =L Gσn for all permutations σ : [n] → [n], where =L denotes equality in law. A compatible sequence of edge labeled graphs (Gn )n≥1 is infinitely edge exchangeable if each Gn is edge exchangeable and the family is consistent in distribution, that is, Gn |[m] =L Gm for all m ≤ n. In the context of Figure 2, an edge exchangeable model assigns equal probability to both networks. Edge exchangeable models are appropriate for the many network datasets above that grow by addition of new edges. The labels assigned to edges are arbitrary and, therefore, should not play a material role in the ensuing inferences. Most importantly, and in stark contrast to the more conventional case of exchangeable networks with labeled vertices, edge exchangeable networks can exhibit power law behavior, as the two parameter model in Section 5 demonstrates. Beyond these specific properties, we characterize the class of edge exchangeable networks with the following representation theorem.

EDGE EXCHANGEABLE MODELS FOR NETWORK DATA

9

4.1. Representation theorem. To streamline the discussion, we specialize to the case of network data for binary interactions E : [n] → V × V, where V is a set labeling the vertices in the network. The more general case in which E : [n] → fin(V) follows by a technically involved, but straightforward, analogy. Given an injection σ : V → V 0 , we define Eσ = σE : [n] → V 0 × V 0 as the network which puts edge labeled i between the vertices in σE(i) := {σ(s) : s ∈ E(i)}, for each i ∈ [n]. As in most networks applications, the elements of V serve only to distinguish between vertices, and so we define an equivalence relation  between two edge labelings E : N → V × V and E0 : N → V 0 × V 0 , written E  E0 , if there is a bijection σ : V → V 0 such that σE = E0 . The equivalence class E/ := {E0 : [n] → V 0 × V 0 : E0  E} corresponds to the edge labeled graph (V, E) with vertex labels removed. From now on we parameterize edge labeled graphs Y by the number of edges n ∈ N, number of vertices N, and a representative selection map E : [n] → [N] such that Y = E/ . We define [∞] = N so that the case of n = ∞, N = ∞, or N = n = ∞ is easily understood. Given any edge labeled graph Y, we write n(Y) to denote the number of non-isolated vertices in Y and N(Y) to denote the number of edges in Y. Given n = n(Y) and N = N(Y), we call E : [n] → [N] a selection map for Y if Y = E/ . The (N × N)-simplex consists of all ( fi j ) j≥i≥0 such that fi j ≥ 0 for all P j ≥ i ≥ 0 and j≥i≥0 fi j = 1. For any f = ( fi j ) in the (N × N)-simplex and i ∈ N, we define X (i)

f• :=

fs

s:i∈s

as the sum of masses involving element i. Given any f in the (N × N)simplex, we define the ranked reordering of f by f ↓ = ( fi↓j ) j≥i≥0 , the element of (i)

(i+1)

the (N × N)-simplex obtained by first relabeling elements so that f• ≥ f• (i) (i+1) for all i ≥ 1 and then breaking ties f• = f• by declaring that ( fi j ) j≥i ≤ ( fi+1, j ) j≥i+1 with elements listed in lexicographic order. We write F ↓ to denote the space of rank reordered elements of the (N × N)-simplex. Every f ↓ ∈ F ↓ determines a probability distribution on edge labeled graphs, denoted  f , as follows. Let X1 , X2 , . . . be i.i.d. random pairs (i, j) of N with P{Xk = (i, j) | f ↓ } = fi j , j ≥ i ≥ 0. Given X1 , X2 , . . ., we define E : N → Z × Z, where Z := {. . . , −1, 0, 1, . . .}, by putting E(n) = Xn whenever Xn contains no 0s. Otherwise, we let mn−1 be the smallest label to appear among E(1), . . . , E(n − 1), with m0 := 0. Given Xn = (0, j) with j > 0, and mn−1 = −p, we put E(n) = (p − 1, j) and mn = p − 1. Given Xn = (0, 0) and mn−1 = p, we put E(n) = (p − 1, p − 2) and mn = p − 2. Define Y = Y(X1 , X2 , . . .) ∼  f to be the edge labeled graph given by Y = E/ .

10

HARRY CRANE AND WALTER DEMPSEY

For any edge labeled graph Y, we write |Y|↓ ∈ F ↓ to denote its signature as follows. Let E : N → N × N be a selection function for Y. For each (i, j) ∈ N × N, j ≥ i ≥ 1, we define n X fi j (E) := lim n−1 1{E(k) = (i, j)}, n→∞

(i)

k=1

f• := lim n−1 n→∞

n X

1{i ∈ E(k)},

k=1

if the limits exist. Provided each of the above limiting frequencies exists, we P (i) P define |E| = ( fi j ) j≥i≥0 by fi0 = f• − j≥i fi j , i ≥ 1, and f00 = 1− j≥i≥0:(i, j),(0,0) fi j . We then put |Y|↓ = |E|↓ . Theorem 4.2. Let Y be an edge exchangeable random graph. Then there exists a unique probability measure φ on F ↓ such that Y ∼ φ , where Z (1) φ (·) =  f (·)φ(d f ). F↓

We defer the proof to the appendix. Remark 4.3. The representation of general edge exchangeable networks E : N → fin(P) with hyperedges of any finite size is more complicated to state but completely analogous to Theorem 4.2. In fact, the statement can be written verbatim as in Theorem 4.2 with an appropriate change in the meaning of the space F ↓ and the measure  f . 5. The two parameter process We now present a specific family of edge exchangeable random graphs, which we call the two parameter edge exchangeable model, whose mixing measure φ is built from the two parameter Poisson–Dirichlet distribution of [23]. In addition to having a straightforward generating mechanism, which we later apply to certain network datasets, the model gives an explicit answer to our burning question from Section 1, as the two parameter model produces an edge exchangeable network that is both sparse and has power law degree distribution. The model produces an edge labeled multigraph with power law degree distribution, but empirical evidence suggests that its projection to an ordinary graph (without edge multiplicities) also exhibits the same power law behavior. We generate a sequence (Gn )n≥1 of edge labeled graphs, where each Gn has size(Gn ) = e(Gn ) = n, by the following process of sequential edge addition. Let 0 < α < 1 and θ > −α be parameters. We initialize by choosing G1 among the two possible edge labeled graphs with one edge, either G1 has a self-loop at a single vertex (with probability (1 − α)/(θ + 1)) or G1 has an edge between two distinct vertices (with probability (θ + α)/(θ + 1)). After n steps, the graph Gn has n edges and a random number of vertices Nn .

EDGE EXCHANGEABLE MODELS FOR NETWORK DATA

11

We label these vertices arbitrarily i = 1, . . . , Nn and write D(i, n) to denote the total degree of vertex i before the (n + 1)st edge is added. (Note that each self-loop from a vertex to itself contributes 2 to its degree.) When the (n + 1)st edge arrives, its two incident vertices v1 (n + 1), v2 (n + 1) are chosen randomly among vertices 1, . . . , Nn and a new vertex Nn + 1 as follows. With Nn∗ = Nn , we first choose v1 (n) randomly with probability ( D(i, n) − α, i = 1, . . . , Nn∗ (2) pr(v1 (n) = i) ∝ θ + αNn1 , i = Nn∗ + 1. After choosing v1 (n), we update Nn∗ according to whether or not v1 (n) is a newly observed vertex: if v1 (n) = Nn∗ + 1, then we define Nn0 = Nn∗ + 1; otherwise, we put Nn0 = Nn∗ . We then choose v2 (n) as in (2) with Nn∗ = Nn0 . When generating a network with directed edges, we orient edges to point from v1 (t) to v2 (t); in the undirected case, the edge between v1 (t) and v2 (t) has no orientation. We write Gn to denote the network generated after n steps of this procedure. We can express the distribution of each Gn in closed form by (3)

pr(Gn = G) = αv(G)

∞ (θ/α)↑v(G) Y exp{Nk (G) log((1 − α)↑(k−1) )}, θ↑(2n) k=2

where G is any network with n edges that can be generated by (2), v(G) is the number of non-isolated vertices in G, (Nk (G))k≥1 gives the number of vertices with degree k for each k, and x↑ j = x(x + 1) · · · (x + j − 1) is the ascending factorial function. A further important property of Gn is that its distribution (3) is independent of the order in which edges arrive during network formation, implying the following. Corollary 5.1. The two parameter model is edge exchangeable for all choices of parameter (α, θ). 5.1. Properties of the two parameter process. The above process generates a sequence (Gn )n=1,2,... of edge labeled graphs, where each Gn has n edges and a random number of vertices Nn . For k = 1, 2, . . ., we write P Nn (k) to denote the number of vertices in Gn with degree k, so that Nn = k≥1 Nn (k). From properties of the generating mechanism in (2), we have the following. Proposition 5.2. The two parameter process with parameter (α, θ), 0 < α < 1 and θ > −α, generates a sequence of multigraphs for which the empirical degree distributions pn (k) = Nn (k)/Nn satisfy pn (k) → αk−(α+1) /Γ(1 − α)

as n → ∞,

R∞

where Γ(t) = 0 xt−1 e−x dx is the gamma function, that is, (Gn )n≥1 has a power law degree distribution with degree 1 < 1 + α < 2. The simulation results in Figure 3 verify Proposition 5.2 empirically. We prove Proposition 5.2 in Section 7.

12

HARRY CRANE AND WALTER DEMPSEY

Moreover, the expected number of vertices satisfies (4)

E(Nn ) ∼

Γ(θ + 1) (2n)α αΓ(θ + α)

as n → ∞.

Given an observed power law exponent 1 < γ < 2, we can use these two properties to estimate the model parameters α and θ by setting α = γ − 1 and choosing the value of θ so that Equation (4) is satisfied by the observed network. Theorem 5.3. The sequence of multigraphs (Gn )n≥1 from the two parameter process with parameter (α, θ), 0 < α < 1 and θ > −α, is sparse with probability 1. 5.2. Finite number of vertices. The parameter space of the two parameter process extends to accommodate network data for which the population size is finite and known, as in the karate club network [27]. In this case, let k ≥ 1 be the known population size, for example, k = 34 in the karate club, and take α < 0 and θ = −αk in the two parameter process above. The probabilities in (2) and (3) remain valid and the resulting sequence (Gn )n≥1 is edge exchangeable with v(Gn ) → k with probability 1 as n → ∞. This regime of the two parameter process is appropriate for interaction datasets, as in the karate club, where the population size is fixed and known and the network is determined by a process of interactions over time. The setting here differs from that in, for example, the actors network, where the population of actors is not known, and perhaps not even well defined, in advance. Models for vertex growth, such as preferential attachment, implicitly assume an infinite population of vertices, an assumption which plainly fails in the karate club network. On the other hand, while the population is finite, there is no upper limit to the number of interactions between each individual—if time went on forever, many individuals would interact infinitely often. The two parameter process captures both of these features. Finally, though it goes without saying to many readers, we point out that any discussion of sparsity or power law is unwarranted in the case of a finite population since both properties require the number of vertices to increase without bound as a function of sample size. 5.3. Projecting to a network without multiple edges. In an edge labeled graph E : N → N × N, the set (E(n))n≥1 may contain the same edge multiple times. In fact, multiple occurrences of the same edge are common in many of the interaction networks of Table 1. For the Enron and actors networks, respectively, multiple edges reflect an exchange of multiple emails between individuals and a casting of the same actors in multiple movies. Although these features may be present in the underlying real world phenomenon, network datasets are often simplified by reducing multiple edges to a single edge.

EDGE EXCHANGEABLE MODELS FOR NETWORK DATA

13

Given network data E : [n] → V × V, we define the standard projection graph G = (V, E∗ ) as the graph with edges (i, j) ∈ E∗

if and only if E−1 ((i, j)) , ∅,

that is, (i, j) ∈ E∗ as long as there is at least one occurrence of that edge in E. More generally, we can define the (t, c)-projection graph G = (V, E(t,c) ) by putting (i, j) ∈ E(t,c) if and only if t(E−1 ((i, j))) > c, for some thresholding function t and cutoff value c ≥ 0. (The standard projection above takes t as the cardinality map and c = 0 so that (i, j) is present in the projected graph as long as E−1 ((i, j)) has positive cardinality.) Due to either space constraints or a dearth of available tools for analyzing multigraph data, many network analyses are based on projected graphs, even when the multigraph data is readily available. In the interest of not discarding data, the multigraph data should be used if at all possible. In cases where the multigraph data is not available, the multigraph process generating the data acts as a latent process. We see that projecting the edge exchangeable network data (Gn )n≥1 from a latent two parameter process preserves the two key properties for network applications. Figure 3 demonstrates empirically that the standard projection preserves the power law degree distribution for the two parameter process. We have the following theorem regarding the preservation of sparsity. Theorem 5.4 (Sparsity of the standard projection). Let (Gn )n≥1 be the sequence of graphs obtained via the standard projection of the two parameter process with parameter 0 < α < 1 and θ > 0. For each n ≥ 1, let mn denote the number of vertices in Gn and let En be the number of edges in Gn . Then mn → ∞ and En = O(m2−α n ) with probability 1 as n → ∞ and the underlying network is sparse. 6. Common network models In Section 1, we alluded to prior attempts by Bickel and Chen [5] and Caron and Fox [6] at modeling sparse network data with models that are, in some sense, exchangeable. We discuss some features of those approaches as they relate to the edge exchangeable models above. 6.1. Bickel and Chen’s decoupled Aldous–Hoover processes. The Aldous– Hoover theorem [3, 13] characterizes every exchangeable model for a vertex labeled graph G = (N, E) by a function φ : [0, 1]3 → [0, 1] such that G =L G∗ = (N, E∗ ), where P{i j ∈ E∗ | (Un )n≥0 } = φ(U0 , Ui , U j ),

i < j,

for (Un )n≥0 i.i.d. Uniform[0, 1]. The first argument is a mixing component and is not identifiable by any finite amount of data, and so it is customary to assume φ : [0, 1]2 → [0, 1] is a symmetric function only of its last two arguments.

14

HARRY CRANE AND WALTER DEMPSEY

γ = 1.67

* *

* * ** * * * * ** * ** * * * ******** ******* *** * * ** ******************************* 0

1

2

3

4

log proportion

*

(A2)

−6 −5 −4 −3 −2

−9 −8 −7

* (A1)

−11

log proportion

*

5

*

γ = 1.67

*

** * * * ** * * ** * * *** * * * ******* * * * ** * ** * * *********************************** 0

1

2

log degree

2

4

6

−2 −4

* *

−6

log proportion

** γ = 1.25 ** ** *** ** * * ******* * *********** ******************************************** ** ********************************************************

(B2) *

−8

−4

*

−6

5

* (B1) *

−10 −8

log proportion

4

log degree

*

0

3

0

γ = 1.25 **** * ** ******* ************** **** ** * * ** ******************************************************************* 2

log degree

4

6

log degree

Figure 3. Simulation results showing degree distribution of networks and their projection to a simple network by removing multiple edges. (A1) Network generated from model with parameters (α, θ) = (0.67, 1), (B1) Network generated from model with parameters (α, θ) = (0.25, 1), (A2) Simple network obtained by reducing multiple edges to single edge in (A1) network, (B2) Simple network obtained by reducing multiple edges to single edge in (B1) network. Results suggest that the generated network and its induced simple network both exhibit power law of similar degree. As an exchangeable model, the resulting network G∗ cannot be sparse or have power law degree distribution. As a way around this, Bickel and Chen [5] propose a “decoupling” approach which introduces a sequence R {ρn }n≥1 satisfying ρn → ∞ and ρ−1 n φ(u, v)du dv = O(1) as n → ∞. n [0,1]×[0,1] They then model network data for a size n sample by taking U1 , . . . , Un independent, identically distributed Uniform[0, 1] and putting Gn =L G∗n with P{G∗n,i j = 1 | U1 , . . . , Un } = ρ−1 n φ(Ui , U j ),

1 ≤ i < j ≤ n.

Though claimed as “natural” in the original paper, this formulation obscures the meaning of the parameter φ as the sample size grows and also has no clear interpretation in terms of data collection; see [10] for further discussion.

EDGE EXCHANGEABLE MODELS FOR NETWORK DATA

15

Every Gn is exchangeable and collectively the sequence (Gn )n=1,2,... is sparse with probability 1, but the formulation does not describe a valid generating process since the marginal distributions of the sequence (Gn )n=1,2,... are not mutually consistent with respect to subsampling vertices. In particular, the so-called sparseness of the sequence (Gn )n=1,2,... does not refer to the property of sparsity for any population network G with vertex set N. 6.2. Caron and Fox’s completely random measure approach. Recognizing that a satisfactory approach to modeling sparse networks must abandon the usual notion of exchangeability with respect to relabeling vertices, Caron and Fox propose to associate network data with a class of exchangeable random measures on R+ × R+ . In this setting, the vertices of G = (V, EG ) are assigned labels l : V → R+ , instead of the usual integer-valued labeling, and the network G is represented by a point process X ⊂ R+ × R+ by (x, x0 ) ∈ X if and only if

(l−1 (x), l−1 (x0 )) ∈ EG .

The point process X is assumed to be exchangeable in the sense that the distribution of X is invariant under measure preserving transformations of R+ × R+ . Caron and Fox go on to show that this approach can produce random graphs with a sparsity property; however, it is unclear whether the model proposed in [6, Section 3.2] produces a family of random graph measures that is consistent under subsampling. In particular, the sample size is defined in terms of a real-valued cutoff α ≥ 0, so that the observed graph is defined as the set of edges between vertices in the box [0, α] × [0, α]. Although exchangeability of the point process endows finite exchangeability to each sample of size α, the dynamics imposed by extending the sample to [0, α + α0 ] × [0, α + α0 ] places restrictions on the labels of newly observed vertices. Thus, we observe a similar phenomenon to that observed in Section 6.1: each of the finite sample models is exchangeable, but the network model as a whole is not infinitely exchangeable. There is also considerable uncertainty about the appropriateness of embedding into the spatial domain R+ × R+ . Though presented as a way to obtain sparsity in the context of an exchangeable data generating mechanism, the exchangeability here refers to an abstract notion of spatial embedding, and not a logical invariance of the model with respect to relabeling units in Principle (II). 7. Appendix: Proof of main theorems 7.1. Proof of Theorem 4.2. We prove the general version of Theorem 4.2 for edge exchangeable hypergraphs. Let Y be an edge exchangeable random graph with selection function E : N → fin(N). By edge exchangeability, the sequence (En )n∈N is an exchangeable fin(N)-valued sequence and, therefore, de Finetti’s theorem

16

HARRY CRANE AND WALTER DEMPSEY

implies the existence of a unique probability measure π on the space of probability measures of fin(N) such that Z P{(En )n∈N ∈ ·} = m∞ (·)π(dm), P(fin(N))

m∞ (·)

where denotes the infinite product measure induced by m. Alternatively, there exists a measurable function f : [0, 1] × [0, 1] → fin(N) such that (En )n∈N =L (E∗n )n∈N , where E∗n = f (α0 , αn ),

n ∈ N,

for (αi )i≥0 i.i.d. Uniform[0, 1] random variables. By assumption (En )n∈N is an exchangeable sequence of finite subsets, and so we can define (Kn )n∈N by Kn = #En , the exchangeable sequence of cardinalities of the hyperedges in (En )n∈N . Together, the sequence (Kn , En )n∈N is jointly exchangeable, and once again de Finetti’s theorem implies a measurable function f : [0, 1] × [0, 1] → N × fin(N) such that #En = Kn with probability 1 for all n ∈ N. We can breakdown f = ( f0 , f1 ) by components so that (Kn , En )n∈N =L (Kn∗ , E∗n )n∈N with (Kn∗ , E∗n ) = ( f0 (α0 , αn ), f1 (α0 , αn )),

n ∈ N.

By the Coding Lemma [2, Lemma 2.1], we can express ( f0 (α0 , αn ), αn )n∈N =L ( f0 (α0 , αn ), u( f0 (α0 , αn ), βn ))n∈N , for (βn )n∈N i.i.d. Uniform[0, 1] random variables independent of ( f0 (α0 , αn ))n∈N and a measurable function u. We define f10 (α0 , βn , k) := f1 (α0 , u(k, βn )), n ∈ N, so that ( f0 (α0 , αn ), f1 (α0 , αn ))n∈N =L ( f0 (α0 , αn ), f10 (α0 , βn , f0 (α0 , αn )))n∈N . Now, condition on α0 = a so that 0 f0 (a, αn ), f10 (a, βn , f0 (a, αn )))n≥1 = ( f0,a (αn ), f1,a (αn , βn , f0,a (αn )))n≥1

is i.i.d. Given α0 = a and ( f0,a (αn ))n≥1 = (Kn )n≥1 , we have 00 P{(En )n≥1 ∈ · | (Kn )n≥1 = (kn )n≥1 } = P{( f1,a (γn , kn ))n≥1 ∈ ·},

where (γn )n≥1 are i.i.d. Uniform[0, 1] independently of (αn , βn )n≥1 , again by 00 (γ , k ) := f 0 (α , β , k ). the Coding Lemma, and f1,a n n 1,a n n n For each j ≥ i ≥ 0, we define fi j (E) := lim n−1 n→∞

n X

1{(En = (i, j) or (j, i)}.

r=1

By de Finetti’s theorem and the law of large numbers, it follows that |E| = ( fi j (E)) j≥i≥1 exists almost surely. Moreover, for E ∼  f , |E|↓ =  f a.s. and the representation in (1) follows.

EDGE EXCHANGEABLE MODELS FOR NETWORK DATA

17

7.2. Proof of Proposition 5.2 and Theorem 5.3. Fix 0 < α < 1 and θ > 0 and let (En )n≥1 be the sequence of edge labeled multigraphs generated as in Section 5 so that each En is a multigraph with size(En ) = e(En ) = n. Let (Gn )n≥1 be its sequence of projected graphs by removing multiplicities. Choose a uniform random labeling of the vertices. Finally, let Nn,k be the number of vertices in Gn with degree k ≥ 1 and let mn be the number of nonisolated vertices in Gn . The degree of any vertex in Gn is no larger than the minimum of mn and its degree in Mn . Thus, e(Gn ) ≤

∞ X (k ∧ mn )Nn,k . k=1

Theorem 3.11 of [24] implies ∞ X

(k ∧ mn ) Nn,k

∞ X ∼ mn (k ∧ mn ) pα,k

k=1

with probability 1 as n → ∞,

k=1

where

Γ(k − α) α α ∼ k−(1+α) Γ(1 − α) Γ(k + 1) Γ(1 − α) for large k. This latter observation proves Proposition 5.2. Since mn → ∞ with probability 1 as n → ∞, pα,k :=

mn

∞ X

(k ∧ mn ) pα,k = mn

k=1

m n −1 X

kpα,k + m2n

m n −1 X k=1

∼ mn

pα,k

k=mn

k=1

∼ mn

∞ X

k

∞ X α pα,k k−(1+α) + m2n Γ(1 − α)

α Γ(1 − α)

k=mn

m n −1 X

k−α + m2n

k=1

Γ(mn − α) Γ(mn )Γ(1 − α)

Γ(mn − α) α ∼ mn m1−α + m2n n Γ(1 − α) Γ(mn )Γ(1 − α) α + 1 2−α ∼ m as n → ∞. Γ(1 − α) n where ∼ signifies that lower order terms have been ignored. While the approximation of pα,k ∝ k−(1+α) holds for large k, the error incurred by applying it for all k in the sum in line 2 is bounded and therefore represents a negligible contribution to the upper bound. The proof is complete. Acknowledgement H. Crane is partially supported by grants from NSF DMS-1308899 and NSA H98230-13-1-0299.

18

HARRY CRANE AND WALTER DEMPSEY

References [1] J. Abello, A. Buchsbaum, and J. Westbrook. A functional approach to external graph algorithms. Proceedings of the 6th European Symposium on Algorithms, pages 332–343, 1998. [2] D. J. Aldous. Representations for partially exchangeable arrays of random variables. J. Multivariate Anal., 11(4):581–598, 1981. ´ d’´et´e de probabilit´es de Saint[3] D. J. Aldous. Exchangeability and related topics. In Ecole Flour, XIII—1983, volume 1117 of Lecture Notes in Math., pages 1–198. Springer, Berlin, 1985. [4] A.-L. Barab´asi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999. [5] P. Bickel and A. Chen. A nonparametric view of network models and Newman–Girvan and other modularities. Proceedings of the National Academy of Sciences of the United States of America, 106(50):21068–21073, 2009. [6] F. Caron and E. Fox. Bayesian nonparametric models of sparse and exchangeable random graphs. Accessed at arXiv:1401.1137, 2014. [7] F. Chung and L. Lu. Complex graphs and networks, volume 107 of CBMS Regional Conference Series in Mathematics. Published for the Conference Board of the Mathematical Sciences, Washington, DC, 2006. [8] D. Cox and D. Hinkley. Theoretical Statistics. Chapman and Hall, London, 1974. [9] H. Crane and W. Dempsey. Atypical scaling behavior persists in real world interaction networks. Submitted, 2015. [10] H. Crane and W. Dempsey. A framework for statistical network modeling. Submitted, 2015. [11] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. ACM Comp. Comm. Review, 29, 1999. [12] A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi. A survey of statistical network models. Foundations and Trends in Machine Learning, 2(2):1–117, 2009. [13] D. Hoover. Relations on Probability Spaces and Arrays of Random Variables. Preprint, Institute for Advanced Studies, 1979. [14] H. Jeong, S. Mason, A.-L. Barab´asi, and Z. Oltvai. Lethality and centrality in protein networks. Nature, 411:41, 2001. [15] B. Klimt and Y. Yang. Introducing the enron corpus. CEAS, 2004. [16] E. D. Kolaczyk. Statistical analysis of network data. Springer Series in Statistics. Springer, New York, 2009. Methods and models. [17] R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the web for emerging cyber communities. Proceedings of the 8th World Wide Web Conference, 1999. [18] S. H. Lee, P. Kim, and H. Jeong. Statistical properties of sampled networks. Physical Review E, 73:016102, 2006. [19] L. Lov´asz and B. Szegedy. Limits of dense graph sequences. J. Comb. Th. B, 96:933–957, 2006. [20] J. J. McAuley and J. Leskovec. Learning to discover social circles in ego networks. In Neural Information Processing Systems, pages 539–547, 2012. [21] P. McCullagh. What is a statistical model? Ann. Statist., 30(5):1225–1310, 2002. With comments and a rejoinder by the author. [22] P. Orbanz and D. Roy. Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):437– 461, 2015. [23] M. Perman, J. Pitman, and M. Yor. Size-biased sampling of poisson point processes and excursions. Probab. Th. Relat. Fields, 92:21–39, 1992. [24] J. Pitman. Combinatorial stochastic processes, volume 1875 of Lecture Notes in Mathematics.

EDGE EXCHANGEABLE MODELS FOR NETWORK DATA

19

[25] R. A. Rossi and N. K. Ahmed. The network data repository with interactive graph analytics and visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015. [26] W. Willinger, D. Alderson, and J. C. Doyle. Mathematics and the Internet: a source of enormous confusion and great potential. Notices Amer. Math. Soc., 56(5):586–599, 2009. [27] W. Zachary. An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33:452–473, 1977. Department of Statistics & Biostatistics, Rutgers University, 110 Frelinghuysen Avenue, Piscataway, NJ 08854, USA E-mail address: [email protected] URL: http://stat.rutgers.edu/home/hcrane Department of Statistics, University of Michigan, 1085 S. University Ave, Ann Arbor, MI 48109, USA E-mail address: [email protected]