Measuring Inequality and Segregation∗ Elizabeth Roberto†
arXiv:1508.01167v1 [stat.ME] 5 Aug 2015
August 6, 2015
Abstract In this paper, I introduce the Divergence Index, a conceptually intuitive and methodologically rigorous measure of inequality and segregation. The index measures the difference between a distribution of interest and another empirical, theoretical, or normative distribution. The Divergence Index provides flexibility in specifying a theoretically meaningful basis for evaluating inequality. It evaluates how surprising an empirical distribution is given a theoretical distribution that represents equality. I demonstrate the unique features of the new measure, as well as deriving its mathematical equivalence with Theil’s Inequality Index and the Information Theory Index. I compare the dynamics of the measures using simulated data, and an empirical analysis of racial residential segregation in the Detroit, MI, metro area. The Information Theory Index has become the gold standard for decomposition analyses of segregation. I show that although the Information Theory Index can be decomposed for subareas, it is misleading to interpret the results as segregation. The Divergence Index addresses the limitations of existing measures and accurately decomposes segregation across contexts and nested levels of geography. By creating an alternative measure, I provide a distinct lens, which enables richer, deeper, more accurate understandings of inequality and segregation.
∗ Thank
you to Richard Breen, Scott Page, and Russell Golman for their valuable feedback.
An earlier version of this paper was included as Essay 1 in: Roberto, Elizabeth. 2015. “The Boundaries of Spatial Inequality: Three Essays on the Measurement and Analysis of Residential Segregation.” PhD thesis, Yale University. † Department
of Sociology, Princeton University, Princeton, NJ, USA
1
Social inequality is a concept that describes the uneven distribution of resources, life conditions, opportunities, or outcomes across individuals, groups, or social classes. A variety of measures seek to answer a seemingly simple question: how unequal is the distribution? All measures of inequality have an implied or explicit notion of equality (Coulter 1989), such as uniformity across individuals, maximal diversity of groups, or randomly occurring events. They evaluate the degree of inequality in a distribution by measuring it against a comparative reference. For example, consider studying gender inequality across academic majors at a university. If the student population is 75% women and 25% men, and among engineering majors 25% are women and 75% are men, is the major segregated? To measure inequality in terms of diversity, we can use the gender diversity of the university as the comparative reference that represents equality. Although the relative proportion of men and women differs in the engineering major and the overall student population, both have a 3 to 1 mix of genders. Since the major has the same level of gender diversity as the university, we would conclude that it is not segregated. However, the gender proportions within the engineering major are unexpected given the university context. Men are over-represented and women are under-represented relative to their overall proportions at the university. Rather than comparing gender diversity, we can measure inequality as the difference between the actual proportion of each gender in the engineering major and the overall student population. Equality is defined by the university’s gender distribution – a major that has the same gender distribution as the university is not segregated. Given the striking difference between the gender proportions of the engineering major and the university, we would conclude that the major is segregated. A comparative reference is typically hard wired into an inequality measure, and it is often not transparent without inspecting the mathematics that underly the measure. As a result, choosing a particular measure also establishes a theoretical definition of equality. Measures should be selected and evaluated with this in mind. A measure’s comparative reference affects assessments of whether one distribution is more or less unequal than another, and thus has important implications for our understanding of inequality. To quote Allison: The decision to rank one distribution as more unequal than another has theoretical as well as methodological implications. In fact, the choice of an inequality measure is properly regarded as a choice among alternative definitions of inequality rather than a choice among alternative ways of measuring a single theoretical construct (Allison 1978:865). The aim of this paper is to improve upon existing measures of inequality and segregation by proposing a new measure of inequality: the Divergence Index. It is derived from an information theoretic measure, relative entropy. The index measures the difference between a distribution of interest and another empirical, theoretical, or normative distribution. In contrast to other measures, the comparative reference is not fixed, it can be specified in a theoretically meaningful way. It is particularly useful for comparing inequality over time, place, or cohort, across counterfactual scenarios, or against a normative standard. I will show that the divergence index is both conceptually intuitive and methodologically rigorous. I will make a normative claim that the Divergence Index is a better measure of inequality and segregation. That, of course, need not be true for the index to be of value. By creating an alternative measure, I provide a distinct lens, which enables richer, deeper, more accurate understandings of inequality and segregation. I begin by describing four types of inequality measures. I then review the desirable properties of inequality measures, as identified in previous research. I describe three existing measures – the Dissimilarity Index, Theil’s Inequality Index, and the Information Theory Index – and summarize their desirable properties and limitations. Next, I introduce my new measure – the Divergence Index – and evaluate it against the same criteria. I demonstrate the unique features of the new measure, as well as deriving the mathematical equivalence between the Divergence Index and Theil’s Inequality Index and the Information Theory Index. Finally, I compare the dynamics of the measures using simulated and empirical data.
2
Types of Measures Inequality is generated through two processes: the distribution of a quantity of interest (e.g. income) among a population of individuals, and the grouping of individuals into sub-populations (e.g. race or class).1 Simple measures assess inequality for the population-level distribution of a quantity. But we often care about differences in inequality or segregation between sub-populations. Some measures allow us to decompose overall inequality into inequality between groups, and inequality within them. Inequality and segregation are operationalized along a number of different dimensions. Seminal work by Massey and Denton (1988) identified five conceptually distinct dimensions of residential segregation: evenness, exposure, concentration, centralization, and clustering. Evenness is the degree to which groups are distributed proportionately across areal units in a city. Exposure is the extent to which members of different groups share common residential areas within a city. Concentration refers to the degree of a group’s agglomeration in urban space. Centralization is the extent to which group members reside toward the center of an urban area; and clustering measures the degree to which minority areas are located adjacent to one another. (Massey and Denton 1988:309–10) Reardon and O’Sullivan (2004) suggested that Massey and Denton’s five dimensions could be collapsed into two primary dimensions: evenness and exposure. They argued that any remaining distinctions between these two and the remaining three is attributable to the aggregate data typically used to measure segregation, and not a conceptual distinction of segregation itself. I argue that the concepts of inequality and segregation are operationalized along two key dimensions – evenness and diversity. Without loss of generality, these dimensions apply to all measure of inequality, not only to segregation indexes. Evenness describes the shape or form of a distribution. How is the quantity of interest distributed across individuals or groups? The evenness dimension includes four of the dimensions of residential segregation identified by Massey and Denton (1988): evenness, clustering, centralization, and concentration. Diversity describes the variety of “types” or groups in the population. (See Page 2007, and 2011 for a in-depth discussion of diversity.) How many groups are there, and in what proportion are they represented? I argue that the exposure dimension of segregation identified by Reardon and O’Sullivan (2004) is a function of diversity rather than being its own conceptual distinction. There are scores of measures that operationalize each dimension of inequality. Coulter (1989) organized measures of inequality into four families according to their underlying mathematical model– combinatorics, entropy, deviations, and social welfare function. Starting with Coulter’s categories, I further classify each family of measures according to the dimension of inequality it operationalizes: combinatorics and entropy measure diversity, and deviations and divergence measure evenness. Table 1 summarizes the classification scheme and provides examples of common measures of intra-group inequality and inter-group inequality (i.e. segregation) within each family. Measures in the combinatorics family are built on probability theory. They measure the diversity of a population of two or more groups by calculating the probability of randomly selecting two individuals from the population who are in the same group. For example, the probability that a pair of individuals are identical with respect to race or income category. For inequality among more than two groups, such as five income categories, the measures use the logic of combinatorics to calculate the probability associated with each combination of the categories (Coulter 1989). The concept of entropy comes from physics and information theory where it is used to measure the randomness of a system or the information content of a message (Coulter 1989; Cover and Thomas 2006). Entropy is the amount of information needed to describe a probability distribution. If two outcomes are 1I
use the terms “individuals” and “components” to refer to the units within a population for which inequality or segregation is measured: people, households, neighborhoods, localities, etc. I use the terms “sub-populations” and “groups” (or “groupings”) interchangeably to refer to types or categories within a population.
3
Table 1: Measures of Inequality and Segregation Dimension
Family of Measures
Common Measures
Diversity
Combinatorics
Blau’s Heterogeneity2 (Blau 1977) Lieberson’s Index of Diversity (Lieberson 1980) Bell’s Interaction Probability Index (Bell 1954)
Entropy
Theil’s Inequality Index3 (Theil 1972; Theil and Finizza 1971) Information Theory Index (Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004; White 1986)
Evenness
Deviations
Gini Coefficient (Duncan and Duncan 1955; Kendall and Stuart 1977; Reardon and Firebaugh 2002) Index of Dissimilarity (Duncan and Duncan 1955; Jahn, Schmid, and Schrag 1947; Taeuber and Taeuber 1965)
Social Welfare Function
Atkinson’s Indexes of Inequality (Atkinson 1970, 1983)
3
equally likely, there is high uncertainly and high entropy. If one alternative has a higher probability, there is less uncertainty and lower entropy. The entropy indexes evaluate inequality with respect to diversity. High entropy indicates high diversity and low inequality, and low entropy indicates low diversity and high inequality. For example, in a population of two groups – men and women, if each group is equally represented there is maximum uncertainty because each group is equally probably. The next person you meet is just as likely to be a man as a woman. There is high diversity and low inequality. But in settings where there is a small minority group, there is low diversity and high inequality. Entropy measures will be discussed in greater detail in a later section of the paper. The deviations family of measures are concerned with the differences of each share of the distribution from the central tendency (e.g. mode or mean) or an adjacent share (according to their size-ordered sequence or time-ordered occurrence). “The deviations model assumes that the distributional characteristics of any numerical series are best described in terms of the deviations of each value from some standard (Alker and Russett 1964) derived from the series itself” (Coulter 1989:35). Coulter’s fourth family of measures is driven by the concept of social welfare – “society’s view of the fairness or desirability of a given distribution” (Coulter 1989:118). This is commonly operationalized as “the total amount of income, distribution of income, and number of people among which the income is distributed” (Coulter 1989:116). Atkinson (1970, 1983, 2008) and others (Schwartz and Winship 1980; Stewart 2006; Vega and Urrutia 2007) have derived several indexes of inequality and segregation based on the axioms of the social welfare function.
2 The
Herfindahl-Hirschman Index of Concentration (Herfindahl 1950; Hirschman 1945) is equivalent to Blau’s Heterogeneity with reversed polarity (Blau 1977; Coulter 1989).
3 The
Theil and Atkinson Indexes are special cases of the “generalized entropy” class of measures (Breen and Salazar 2011; Cowell 1980a; Cowell and Kuga 1981; Shorrocks 1980).
4
Desirable Properties of Inequality Measures Previous research has identified a set of desirable properties for inequality and segregation measures (Allison 1978; Bourguignon 1979; Coleman, Hoffer, and Kilgore 1982; Jahn et al. 1947; James and Taeuber 1985; Morgan and Norbury 1981; Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004; Schwartz and Winship 1980; Taeuber and Taeuber 1965; White 1986). Measures are commonly evaluated with respect to how well they meet these criteria. First, I review the criteria concerning the conceptual and methodological qualities of measures. They address how measures should respond to distributional changes (e.g. changes to the distribution of individual incomes or the population count of each group). I organize these criteria into three categories: features of the distribution, changes to the whole distribution, and changes within the distribution. Next, I review the desirable technical qualities and quantities of measures. This second set of criteria address how a measure should be calculated and interpreted. Conceptual and Methodological Qualities of Measures Measures should be invariant to the following features of a distribution (Table 2):
Table 2: Criteria Concerning Features of the Distribution Criteria
Description
Citations
Individual Cases
All cases should be treated the same.
Symmetry requirement (Bourguignon 1979)
Population Size
Proportionate increases or decreases in the size of the population have no effect on inequality.
Symmetry axiom for population (Bourguignon 1979; Sen 1973) Size invariance (James and Taeuber 1985; Reardon and Firebaugh 2002) Population density invariance (Reardon and O’Sullivan 2004)
Aggregations of Cases
Inequality should be invariant to the aggregation of components with identical compositions into a single unit, or dividing a single unit into components with the same composition.
Organizational equivalence (James and Taeuber 1985; Reardon and Firebaugh 2002) Location equivalence (Reardon and O’Sullivan 2004) Arbitrary boundary independence (Reardon and O’Sullivan 2004)
Measures should satisfy the following criteria about changes to the whole distribution of cases (Table 3):
Table 3: Criteria Concerning Changes to the Whole Distribution Criteria
Description
Citations
Additive Increases
Additive increases to the whole distribution should reduce inequality, because it reduces the relative difference between cases.
Scale invariance (Allison 1978)
5
Criteria
Description
Citations
Proportionate Increases
Multiplying the whole distribution by a constant should have no effect on inequality, because it has no effect on the relative difference between cases.
Scale invariance (Allison 1978) Income-zero-homogeneity property (Bourguignon 1979) Composition invariance (Jahn et al. 1947; James and Taeuber 1985; Morgan and Norbury 1981; Taeuber and Taeuber 1965)
The proportionate increases criterion is known as composition invariance in the segregation literature, and it has long been a source of debate. James and Taeuber (1985) explain the principle of composition invariance with reference to racial segregation in schools: “proportional changes in the numbers of students of a specific race enrolled in each school do not affect the measured level of segregation” (p. 16). By their definition, a segregation index is not composition invariant if its value is a function of the overall population composition. However, Coleman et al. (1982) argue that under certain definitions of segregation it is substantively appropriate to standardize an index by the overall population composition. One such example is defining a segregation index in terms of the extent of inter-group contact – no inter-group contact indicates maximum segregation, and contact proportional to the overall group proportions indicates zero segregation. In a population with a small minority group, we could expect less inter-group contact than in a population with equally represented groups, and the index adjusts to these expectations. Making such an index invariant to the population composition would distort its substantive meaning. Reardon and O’Sullivan (2004:134) take a reasonable stance, stating that “the traditional composition invariance criterion espoused by James and Taeuber (1985) is less important than is ensuring that a measure of segregation has a sound conceptual basis. If a segregation index measures exactly that quantity that we believe defines spatial segregation, then the index will be composition invariant by definition.” Measures should satisfy the following criteria about changes within the distribution (Table 4):
Table 4: Criteria Concerning Changes within the Distribution Criteria
Description
Citations
Transfers and Exchanges
1. Any transfer from a unit (e.g. individual, group, or location) with more of the relevant quantity (e.g. income) to another with less should decrease inequality, provided that the rank order remains the same. 2. Likewise, any transfer to a unit with more of the relevant quantity should increase inequality.4
Pigou-Dalton principle (Dalton 1920; Pigou 1912) Inter-group transfers (James and Taeuber 1985; Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004) Inter-group exchanges (Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004)
4 For
example, from Allison (1978:868): “measures of inequality ought to increase whenever income is transferred from a poorer person to a richer person, regardless of how poor or rich or the amount of income transferred.”
6
Technical Qualities and Quantities of Measures In addition to desirable conceptual and methodological qualities of measures, a second set of criteria concern the technical qualities and quantities of inequality measures. The criteria – additive decomposability, and upper and lower bounds are summarized in Table 5. Additive decomposability is a desirable property because it allows for a deeper analysis of the sources of inequality. The relative contribution of each component or group to overall inequality can be identified, and the inequality occurring within- and between-subpopulations can be analyzed (Bourguignon 1979). Many measures are bounded between 0 and 1, with 1 indicating maximum inequality. If a measure has known upper and lower bounds, it can be rescaled to conform to a 0 to 1 range. However, rescaling the measure may shift the definition of inequality from absolute to relative. It is most important for the bounds of the index be known and interpretable.
Table 5: Technical Qualities and Quantities of Measures Criteria
Description
Citations
Additive Decomposability
Measures should be decomposable into the sum of inequality within and between sub-populations
Aggregativity and additivity (Bourguignon 1979) Decomposition (Allison 1978) Additive decomposability5 (Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004)
Upper and Lower Bounds
A measure should have known upper and lower bounds and each should have a substantive interpretation.
Scale interpretability (Reardon and O’Sullivan 2004) Upper and lower bounds (Allison 1978) Principle of Directionality (Fossett and South 1983)
Relative or Inequality
Relative and absolute measures based on whether inequality is independent of, or a function of, the number of categories (respectively).
Sensitivity to the number Absolute are differentiated of components (Waldman 1977)
Existing Measures of Inequality and Segregation In this section, I describe three commonly used measures of inequality and segregation: the Dissimilarity Index, Theil’s Inequality Index, and the Information Theory Index. I have restricted the discussion of existing measures to those that summarize inequality for whole distributions. This excludes measures that target specific points of comparison within a distribution, such as a ratio of values for the 90th and 10th percentiles (Breen and Salazar 2011). I summarize the properties of each in Table 6. The rows of the table correspond to the properties of measures detailed in the previous section and additional features of the indexes discussed in the first section: the comparative standard and relevant types of distributions. 5 For
segregation measures, this includes additive organizational decomposability (Reardon and Firebaugh 2002), additive grouping decomposability (Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004) and additive spatial decomposability (Reardon and O’Sullivan 2004).
7
Measures of Deviation The Dissimilarity Index The Dissimilarity Index (Duncan and Duncan 1955; Jahn et al. 1947; Taeuber and Taeuber 1965) is the most widely used measure of residential segregation. It is also used to measure inequality, known as mean relative deviation. It measures the deviation of each location’s population composition from the overall population composition. It is typically calculated for a population with two mutually exclusive groups, although multi-group versions have been formulated (Morgan 1975; Reardon and Firebaugh 2002; Sakoda 1981). The index is calculated as the absolute difference between the proportion of groups A and B in the ith location, summed over all locations and divided by 2: N 1X τiA τiB DI = − TA 2 TB i=1
where τiA is group A’s population count in location i and TA is the total population of group A, and likewise for group B.6 If group A and B’s populations are distributed across locations in the same proportions, then there is no segregation. Segregation is measured as the extent to which the distribution of group A deviates from group B. One of the appeals of the dissimilarity index is its straight forward interpretation. It is the proportion of one group that would have to move to another location to equalize the distribution of groups across locations (Duncan and Duncan 1955; Massey and Denton 1988). The moves must be from locations where the group is overrepresented to locations where the group is underrepresented (White 1986). Despite the ease of calculation and interpretation, the dissimilarity index has a number of notable limitations, which have been well documented (Cortese, Falk, and Cohen 1976; Falk, Cortese, and Cohen 1978; Fossett and South 1983; Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004; Theil 1972; Winship 1978). I summarize the properties of the index, and its limitations in Table 6. The main drawbacks of the index are that it is not additively decomposable, it does not satisfy the transfers and exchanges criteria, and it can only be used with discrete distributions of nominal data. It is debatable whether or not the dissimilarity index satisfies the proportionate increases criterion. Cortese et al. (1976) found that it is sensitive to the minority group proportion, while others found no such association (James and Taeuber 1985; Lieberson and Carter 1982; Taeuber and Taeuber 1965). Reardon and colleagues (Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004) find that it is only composition invariant when calculated for two groups.
Measures of Entropy Entropy is the degree of uncertainty or randomness in a system (Coulter 1989; Cover and Thomas 2006; Shannon 1948; Theil 1967). It is commonly used in physics and statistical mechanics as a measure of disorder in a thermodynamic system (Cover and Thomas 2006). Shannon (1948) extended the use of entropy to information theory as a measure of the uncertainty associated with a message. Theil introduced the concept of entropy to the social sciences as a measure of inequality and segregation (Theil 1967, 1972; Theil and Finizza 1971). Entropy is the average uncertainty of a discrete or continuous random variable. It is calculated using the variable’s probability distribution. It measures the probability of an outcome (m) occurring, weighted by 1 its probability of occurrence (πm ). The entropy of each outcome (m) is Em = log . Following standard πm 6 It
can also be calculated as a weighted mean by weighing the absolute deviation for each component by its population size (White 1986), or rescaled by dividing by the maximum possible value of D given the overall proportion of each group (Zoloth 1976).
8
usage, I define 0 log 0 = 0, because limx→0 (x log x) = 0. Weighting each outcome by the probability of its occurrence, we get: M X 1 E= πm log πm m=1 The base of the logarithm defines the units of the index (Shannon 1948; Theil 1972). Log base 2 (log2 )is typically used in information theory, which gives results in units of binary bits of information. It is common for inequality measures to use the natural logarithm (ln), which has the mathematical constant (e) as its base. Entropy can be thought of as the uncertainty associated with the value of a random draw from a probability distribution. It describes the uncertainty associated with the outcome, or the amount of information that we have about what the outcome will be. In a deterministic system, the outcome has a probability of 100% and the entropy of the distribution is 0 – there is no uncertainty. If there are two equally likely outcomes, such as with a fair coin toss, the entropy of each outcome is 1 and the average uncertainty (E) is 1, its maximum value. In other words, we have no information about what the outcome will be. Entropy can also be interpreted as a measure of diversity (Reardon and Firebaugh 2002; White 1986). If all individuals in a population are associated with the same group (e.g. racial classification or income level), indexed above with m, there is no diversity in the population. There is no uncertainty about a randomly selected individual’s group, and entropy is equal to 0. On the other hand, if individuals are evenly distributed among two or more mutually exclusive groups, there is maximum diversity (and maximum uncertainty) in the population, and entropy is equal to 1. The desirable properties of entropy have been well documented (e.g. Cover and Thomas 2006; Shannon 1948; Theil 1967). It can be calculated for any number of sub-populations. It has known upper and lower bounds with substantive interpretations. Importantly, entropy is an additive and decomposable measure, which makes it particularly desirable as a measure of inequality (Theil 1972).7 It is simple to aggregate (and disaggregate) the entropy for multiple groups and to decompose total entropy into the entropy occurring within- and between-groups. The entropy for each component (i) is the sum of the entropy across groups within that component (m): Ei =
M X
πim log
m=1
1 πim
The entropy for all components is the mean of the individual entropies, weighted by the relative size of each component: N X τi E¯i = Ei T i=1 Theil (1972) showed that total entropy can be calculated for any subdivision of the population and written as the sum of a between-subdivision entropy and the average within-subdivision entropies. P For example, if the groups are aggregated into supergroups (Sg ), where Πig = m∈Sg πim is the proportion in each supergroup (g) within component (i). The entropy within supergroup g for component i is: Eig =
X πim Πig log Πig πim
m∈Sg
7 The
additivity of entropy comes from one of the properties of logarithms: log(π1 · π2 ) = log(π1 ) + log(π2 )
9
And the between-supergroup entropy is: Ei0 =
G X Πig g=1
πi.
log
πi. Πig
The total entropy for component i can then be written as the between-supergroup entropy (Ei0 ) plus the average within-supergroup entropy (Eig ): Ei = Ei0 +
G X Πig g=1
πi.
Eig
The entropy equations can be defined using logarithms to any base. The selected base defines the units of the index (Shannon 1948; Theil 1972). For discrete distributions, it may be preferable to use the number of groups as the base. The result is equivalent to dividing by the maximum entropy (log M ), given by the number of groups (M ). With the number of groups as the log base (logM ), results are scaled to have the same maximum entropy no matter how many groups in the population. This transforms entropy from an absolute to a relative measure of inequality. It allows for easier comparison across results with different numbers of groups, but comes at the cost of one of the desirable properties of entropy – aggregation equivalence and independence. Using a fixed log base, such as base 2 (log2 ) or e (ln), entropy is an absolute measure. Results are a function of the number of groups in the population (Waldman 1977). Given a uniform distribution of groups (indicating maximum diversity), entropy is an increasing function of the number of groups. At first blush, this may seem undesirable, but it has the benefit of maintaining entropy’s aggregation equivalence and independence. This means that inequality calculated for a population of two groups is the same as if there were three groups in same population, but no individuals associated with the third type. For example, using log2 to measure white-black-Hispanic residential segregation in a city with no Hispanic residents gives the same results whether all three races are included in the measure or only the two with population. This is not the case using logM , because results are scaled according to the number of groups included in the index. Which of these options is preferable depends on the analytic aim of the research, but it is important to be aware of this trade-off.8 Theil’s Index of Income Inequality Theil noted that raw entropy scores are not good measures of inequality or segregation, and derived several indexes based on the entropy measure (Theil 1972; Theil and Finizza 1971). One of Theil’s indexes is a common measure of income inequality (Theil 1967). The index can be written as: N
I=
1 X xi xi log ¯ x ¯ N i=1 x
where xi is the income of individual earners or groups of earners, and x ¯ is average income. When all incomes are equal (all individuals earn the mean income), there is no inequality and I is 0. The index measures the difference between the observed distribution and a single value, the mean. The Theil index has many desirable properties, which are summarized in Table 6. It is a special case of the generalized entropy class of measures, which also includes the “mean log deviation” and “half the coefficient of variation” indexes (Cowell 1980b, 1980a; Cowell and Kuga 1981; Shorrocks 1980, 1984).9
8 As
we will see, this choice does not affect results of the information theory index, because the log appears both in the numerator and denominator of the equation.
9 It
is also approximately equivalent to Atkinson’s inequality measure when the value of the weights in the social welfare function is close to 0 (Schwartz and Winship 1980).
10
The Information Theory Index Theil also developed the Information Theory Index, another entropy based measure. He demonstrated the usefulness of the index in a study of the racial segregation in Chicago public schools (Theil and Finizza 1971). In recent years, the index has been suggested as a measure of residential segregation (Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004; White 1986). For a single component (i), the index measures the extent to which the components’ entropy (Ei ) is reduced below the overall entropy (E), standardized by dividing by the overall entropy (Theil and Finizza 1971): E − Ei Hi = E Or, equivalently, it can be interpreted as one minus the ratio of within-component diversity to overall diversity (Reardon and Firebaugh 2002). Ei Hi = 1 − E Aggregating across all components, we simply replace the entropy of a single component with the weighted average entropy for components, or calculate the weighted average of Hi for all components: H =1−
N X τi Ei i=1
TE
= 1−
E¯i E
or
H=
N X τi i=1
T
Hi
where T is the overall population count, and τi is the population count for component i. Following from the interpretation for each component, the aggregate index (H) is the relative reduction in the average entropy of components (E¯i ) below the maximum attainable entropy (E) (Theil and Finizza 1971). Or one minus the ratio of average within-component diversity to overall diversity (Reardon and Firebaugh 2002). As a measure of residential segregation, the information theory index compares the diversity of local areas to the overall population diversity (Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004). Segregation is high when local areas are much less diverse than the city at-large, and low when local areas are as diverse as the overall population. A value of 1 indicates maximum segregation – there is no diversity in local areas. A value of 0 indicates no segregation – all local areas are as diverse as the overall population. The information theory index typically ranges between 0 and 1, but the minimum value can be less than 0. Reardon and O’Sullivan (2004) interpret negative values of the index as indicating “hyper-integration.” This occurs when localities are more diverse, on average, than the region as a whole. In other words, groups are more equally represented in local areas than in the overall population. The properties of the information theory index are summarized in Table 6. It does not satisfy the proportionate increases criterion according to the definition of composition invariance described by James and Taeuber (1985) – the value of the index should not be a function of the overall population composition. However, Reardon and O’Sullivan (2004) show that the index does conform to other definitions of composition invariance. For instance, it is invariant to compositional changes as long as the relationship between local population diversity and overall population diversity remains constant. Reardon and O’Sullivan (2004) show that the index satisfies the transfers and exchanges criteria when used to measure aspatial segregation. None of the indexes they evaluated satisfy the transfers criterion when used to measure spatial segregation. Spatial approaches often include a proximity weighted contribution from neighboring areas in each location’s population. This makes it difficult for any index to satisfy the transfers and exchanges criteria because the local populations are not mutually exclusive. They show that the information theory index satisfies the exchanges criterion under certain general conditions (see Reardon and O’Sullivan 2004).
11
A New Measure of Inequality and Segregation: The Divergence Index I developed a new measure of segregation called the “Divergence Index.” The index is based on relative entropy, an information theoretic measure of the difference between two probability distributions (Cover and Thomas 2006).10 It measures the evenness dimension of inequality and segregation, and it combines the mathematical logic of the entropy and the deviations families of measures. The divergence index measures the difference between a distribution and another empirical, theoretical, or normative distribution. For discrete probability distributions P and Q, the divergence of Q from P is defined as: M X Pm D (P k Q) = Pm log Q m m=1 The index measures the entropy of P relative to Q, or the relative entropy of P with respect to Q. It represents the divergence of a model (Q) from reality (P ). The index can be interpreted as a measure of surprise. How surprising are the observations (P ), given the expected value (Q)? Or, how surprising is an empirical distribution (P ), given a theoretical distribution (Q)? It is a non-symmetric measure of the dissimilarity between the two distributions (Bavaud 2009).11 The divergence of Q from P does not necessarily equal the divergence of P from Q.12 The asymmetry is an intentional feature of the measure. As Bavaud states, “the asymmetry of the relative entropy does not constitute a defect, but perfectly matches the asymmetry between data and models” (Bavaud 2009:57). The Q distribution defines the standard of equality against which inequality is measured. It should represent the expected state of equality in the P distribution. Q can be theoretically determined or empirically derived. For example, it can be a standard probability distribution (e.g. a normal or uniform distribution), a prior state of the P distribution, or the aggregation or mean of the observed data (P ). The divergence index has known upper and lower bounds with substantive interpretations. The minimum value is 0, indicating no difference between P and Q – no inequality. The maximum value indicates maximum inequality. It can be less than or greater than 1. The divergence index can be standardized to have a range of 0 to 1 by dividing by its maximum value for a given population. Standardization transforms the index from an absolute to a relative measure of inequality.
Measuring Segregation with the Divergence Index Like the information theory index, we can use the divergence index to measure residential segregation. Q is the expected distribution (i.e. proportion of each group) if there is no segregation. It should represent the dimension of segregation we wish to study. To measure diversity, we would specify Q as equal proportions of each group (i.e. maximum diversity). To measure evenness, we would specify Q as the overall proportion of each group in the region. I specify the divergence index to measure the evenness dimension of inequality and segregation in this paper. As a segregation index, it measures the difference between the overall proportion of each group in the region (e.g. a city or metropolitan area) and the local proportions in areas within the region. The index asks: how surprising is the composition of local areas given the overall population composition of the region? If there is no difference between the observed local proportions of each group (P ) and the theoretical proportions (Q), then there is no segregation in the region. More divergence between the overall and local
10 Relative 11 In
entropy is also frequently called Kullback–Leibler (KL) divergence (Kullback 1987).
contrast, entropy (E) is symmetric in P (x) and 1 − P (x).
12 It
is possible to calculate a symmetric version of the index as the sum of D (P k Q) and D (Q k P ), but such an index does not measure the concept of inequality that motivates this paper.
12
proportions indicates more segregation.13 The divergence index for location i is: Di =
M X
πim πm
πim log
m=1
where πim is group m’s proportion of the population in location i, and πm is group m’s proportion of the overall population.14 Overall segregation in the region is the weighted average of the divergence for all locations: n X τi D= Di T i=1 where T is the overall population count, and τi is the population count for component i. If all local areas have the same composition as the overall population, then D = 0, indicating no segregation in the region.
Desirable Properties of The New Measure One of the unique features of the divergence index is that it can be calculated for either discrete distributions (relative entropy) or continuous distributions (differential relative entropy) (Cover and Thomas 2006). The desirable properties of both relative entropy and differential relative entropy have been well documented. Table 6 summarizes the properties of the divergence index, along with the Dissimilarity Index, Theil’s Inequality Index, and the Information Theory Index. Many of the properties of the divergence index follow directly from the properties of entropy, while others depend on how the reference distribution is specified. As I will show in the next section, the Theil index is a special case of the divergence index, and as such, they share many of the same properties. Like entropy, relative entropy is additively decomposable. We can aggregate residential locations into districts, or groups into supergroups and calculate the inequality within and between these districts or supergroups. The sum of the inequality within and between the aggregate units is identical to overall inequality for individual units. For example, to measure residential segregation for districts within a city, we rewrite the divergence index as the sum of between-district segregation and the average within-district segregation. The average within-district segregation for district j is: Dj =
M X τi X πim πim log Tj m=1 πjm
i∈Sj
where Sj is the set of locations in district j. The reference distribution, πjm , is the population composition of district j. It is the weighted average of the group proportions for all localities (i) within the district:
13 The
greater the divergence of Q from P , the lower the probability of observing the local proportions (P ) if there is no segregation in the region (Q).
14 To
measure segregation spatially, the index is calculated as: ˜i = D
M X
π ˜im log
m=1
π ˜im πm
where π ˜im is group m’s proportion of the spatially weighted population in the local environment of location i. To summarize for all locations i in the region: ˜ = 1 D T
N X
˜i τi D
i=1
where T is the overall population count, and τi is the population count in location i.
13
Table 6: Properties of the Measures
Criteria
Dissimilarity Index
Theil Index
Information Theory Index
Divergence Index
Individual Cases
X
X
X
X
Population Size
X
X
X
X
Aggregations of Cases
X
X
X
X
Proportionate Increases
X
X
X
X
Additive Increases
X
X
X
X
X15
X
X16
Additive Decomposability
X
X
X
X
Upper and Lower Bounds
X17
X
X
X
Relative
Either
Absolute
Either
Comparative Standard
Evenness (mean of the distribution)
Evenness (mean of the distribution)
Randomness
Any
Distribution Types
Discrete with nominal categories
Continuous
Discrete
Discrete or continuous
14
Transfers and Exchanges
Relative or Absolute Inequality
15 The
dissimilarity index satisfies a weak form of the transfers and exchanges criteria (Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004).
16 The
transfers and exchanges criterion generally only applies when components are mutually exclusive, as described in the text.
17 The
the dissimilarity index is bounded between 0 and 1, but the expected value of the index is greater than 0 (Cortese et al. 1976).
X
16
Criteria Citations
Dissimilarity Index
Theil Index
Information Theory Index
Divergence Index
Bourguignon 1979; Cortese et al. 1976; Coulter 1989; Duncan and Duncan 1955; Falk et al. 1978; Fossett and South 1983; Jahn et al. 1947; James and Taeuber 1985; Lieberson and Carter 1982; Massey and Denton 1988; Morgan 1975; Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004; Sakoda 1981; Taeuber and Taeuber 1965; Theil 1972; Winship 1978
Allison 1978; Bourguignon 1979; Cowell 1980b; Cowell, Flachaire, and Bandyopadhyay 2013; Shorrocks 1980, 1984, 2012; Theil 1967, 1972
Reardon and Firebaugh 2002; Reardon and O’Sullivan 2004; Theil 1967, 1972; White 1986
Bavaud 2009; Cover and Thomas 2006; Cowell 1980b; Cowell et al. 2013; Magdalou and Nock 2011; Mori, Nishikimi, and Smith 2005; Shorrocks 1980, 1984, 2012; Theil 1967; Walsh and O’Kelly 1979
15
πjm =
P
i∈Sj
τi πim , where Tj is the population count for district j. The between-district segregation is: Tj D0 =
M J X Tj X j=1
T
πjm log
m=1
πjm πm
Total segregation is the sum of the between-district segregation (D0 ) and the average within-district segregation (Dj ): J X Tj D = D0 + Dj T j
Comparison of the Measures Measuring Income Inequality: Equivalence between I and D Theil’s inequality index (I) and the divergence index both measure inequality relative to a defined standard. In the case of income inequality, the Theil index measures the difference between the observed shares of income across individuals or groups and a theoretical uniform distribution – one in which everyone’s income is the mean income. There is a straightforward equivalency between I and D for continuous distributions, such as income.18 τ xi Theil’s index can be written like the divergence index, where Pi is i’s share of total aggregate income, , Tx ¯ τ and Qi is the theoretical uniform share : T
D (P k Q) =
M X
Pm log
m=1
I=
N X
Pi log
i=1
Pm Qm
Pi Qi
τi xi ¯ log T x = T x ¯ τ i i=1 T N 1X τi xi xi log = T i=1 x ¯ x ¯ N X τi xi
If τi = 1 and T = N , then we get: I=
N 1X xi xi log N i=1 x ¯ x ¯
We can see that I is a specific case of D applied to measuring income inequality, using uniform shares of income as the comparative standard.
18 Moreover,
the equivalency applies to any distribution for which a mean can be calculated, such as a discrete simplification of a continuous distribution.
16
Measuring Residential Segregation: Equivalence between H and D The information theory index, H, measures the ratio of local diversity to overall diversity. Whereas the divergence index, D, measures the difference between the local and overall group proportions. To derive the equivalence between H and D, we first rewrite the equation for D as: D = E − E¯i (Theil and Finizza 1971). E − E¯i . From this, we can derive the equivalence as: Recall that we can write the equation for H as: H = E H=
D E
and
D = HE
H is equivalent to D standardized by E, or the ratio of D to E. The equivalence of H and D is only applies to discrete distributions and when E ≤ E¯i . Under these conditions, both D and H are nonnegative. D is always nonnegative (see Cover and Thomas 2006), but that is not necessarily true of E and H. Next, I describe the conditions under which negative values of E and H can occur. If either condition applies, then the equivalence provided above does not hold. E is Negative The entropy of a discrete distribution is always nonnegative, but Cover and Thomas (2006:244) show that the entropy of a continuous distribution (called “differential entropy”) can be negative. For example, the differential entropy of a uniform distribution U (0, a) is negative for 0 < a < 1. This occurs 1 because the density of the distribution is from 0 to a, and a Z a 1 1 log dx = log a E=− a a 0 Because a < 1, therefore log a < 0. In contrast, both relative entropy and differential relative entropy (the discrete and continuous versions of D) are always nonnegative (Cover and Thomas 2006). H is Negative H will be negative if average local entropy, E¯i , is greater than overall entropy, E.19 Theil and Finizza (1971) showed that E¯i cannot be greater than E, but I find that it can occur if the following three conditions are satisfied: 1) if at least one group is over- or under-represented in the overall population, 2) if there are local areas where groups are more equally represented than in the overall population, and 3) if the populations of local areas are not mutually exclusive. Condition #3 is especially common when measuring segregation spatially. The local population often includes a proximity weighted contribution from other nearby areas. Non-exclusive components are also common in social network analysis, e.g. students with overlapping friendship networks. If these three conditions are satisfied, then E is not always greater than E¯i , and E can not be used to derive the equivalence between H and D. In their study of racial school segregation in Chicago, IL, Theil and Finizza (1971) assumed that the population of schools and districts were mutually exclusive. This was a reasonable assumption: condition #3 did not apply to their study population. However, their conclusion – that the average entropy of schools in a district cannot be greater than the entropy of the district – does not generalize to all contexts. Same Segregation Results for H and D Figure 1 compares the functional form of D and H for three hypothetical cities. Each city has two groups, but the proportion of each varies: 50-50 in the city A, 75-20 in city B, and 90-10 in city C. The horizontal is possible to observe non-negative values of H when E is negative, but only if E¯i is also negative. It is also possible for H to be greater than 1, but only when measured for a continuous distribution and when either (but not both) E or E¯i is negative.
19 It
17
axis shows the proportion of the local population in group 1 for areas within the city. The vertical axis shows the degree of segregation for areas within the city. The solid line plots local segregation measured with D across the range of local group proportions, and the dashed line shows segregation measured with H. Results for H and D are the same when there is an even mix of groups in the overall population, as in city A (Figure 1a). If the proportion of each group in local areas is the same as their city proportions, then both measures find zero segregation. If all local areas are monoracial, such that each group is either 100% or 0% of the local population, then both measures would calculate segregation at its maximum value, 1. If the proportion of each group varies across local areas, then both measures would find a moderate amount of segregation. Moreover, both measures calculate the same level of segregation when there is maximum diversity in the population. The difference between H and D is greatest when there is a small minority group in the overall city population. Opposite Segregation Results for H and D H and D represent different aspects of segregation. H measures how diverse the local and overall populations are, whereas D measures how different they are. H is 1 minus the ratio of local diversity to overall diversity. H equals 0, indicating no segregation, when all locations have the same level of diversity as the overall population. In contrast, D measures the difference between the local population composition and the overall population composition. D equals 0, indicating no segregation, when there is no difference between the local composition and the overall population. If there is only one group present in the city and all local areas are monoracial, H and D give opposite results. H would show that this city is maximally segregated (H = 1) because there is no diversity in either the local areas or the city.20 In contrast, D would find that this city is not at all segregated (D = 0), because there is no difference between the composition of local areas and the city as a whole – each local area is a microcosm of the city. Figure 1: Comparing Segregation Measures
0.00
0.25 0.50 0.75 Proportion Group 1
1.00
Divergence Index (D) Information Theory Index (H)
Segregation 1 2 0 −1
0
Segregation 1 2
3
Divergence Index (D) Information Theory Index (H)
−1
−1
0
Segregation 1 2
3
Divergence Index (D) Information Theory Index (H)
(c) Overall Group Proportions: 0.9, 0.1
3
(b) Overall Group Proportions: 0.75, 0.25
(a) Overall Group Proportions: 0.5, 0.5
0.00
0.25 0.50 0.75 Proportion Group 1
1.00
0.00
0.25 0.50 0.75 Proportion Group 1
1.00
0 H is undefined if there is only one group in the population, because H = 1 − . If there are two groups in the 0 population, the limit of H as the minority group’s population count approaches 0 (and E and E¯i approach 0) is 1.
20 Technically,
18
Minimum and Maximum Local Segregation – Hi and Di In Figure 1, we can see that the maximum and minimum values of local segregation vary for the two measures. Local values of the divergence index, Di , reach their maximum value when a city’s minority group is 100% of the local population. Di takes its minimum value, 0, when the local population composition is the same as the overall composition of the city. Local values of the information theory index, Hi , reach their maximum value when any group is 100% of the local population, regardless of the city’s population composition. Hi equals 0 when the local proportions are the same as the city proportions. Hi takes its minimum value when a location has an even mix of groups, regardless of the city’s population composition. The minimum value of Hi is a decreasing function of the city’s overall diversity. (Recall that Hi is 1 minus the ratio of local diversity to overall diversity.) Hi will be lower in a city with less overall diversity than in a city with more diversity, given the same level of local diversity. We can see this by comparing the cities in Figure 1. The inflection point, or minimum value, of the function is 0 in city A where there is an even mix of groups, and it is negative in cities B and C. If local areas are marginally more diverse than the overall population, on average, then overall segregation, H, will be negative.21 Reardon and O’Sullivan (2004) interpret negative values of H as indicating hyperintegration – each group is more equally represented in local areas, on average, than in the overall population. Their concept of hyper-integration applies to diversity measures of segregation, such as H. There is no corresponding concept for evenness measures of segregation, such as D. Further, D is always nonnegative (see Cover and Thomas 2006). Is Detroit Segregated? To further illustrate the differences between H and D, I use both indexes to measure residential segregation by race and ethnicity in Detroit, MI and compare the results. I use population data from the 2010 decennial census aggregated at the level of census tracts, which have an average population of 4,000 individuals (U.S. Census Bureau 2011).22 I compare White-Black segregation for three nested levels of geography – the city of Detroit, the Detroit metro area, and the state of Michigan. The population of Detroit, MI is 82% Black and 8% White.23 It is common for the racial composition of U.S. cities to differ from the surrounding suburbs (Farrell 2008; Fischer 2008), and Detroit, MI shows one of the most pronounced examples of this. Although a large majority of city residents are Black, the metro area and state are predominantly White. Table 7 compares the population by race and ethnicity for the city of Detroit, the Detroit metro area, and Michigan. I calculate White-Black segregation for each level of geography – Detroit, the metro area, and Michigan– and then decompose overall segregation into segregation occurring within and between the city of Detroit and the remainder of the metro area or state. I expect H to respond to changes in the population diversity and D to respond to differences in the population composition between the city and the surrounding areas. Table 8 reports the overall entropy, the average local entropy of census tracts, and segregation results for H and D for each level of geography. The divergence index shows that there is little difference between the local population of census tracts in Detroit and the overall city population, D = 0.14. Results for the information theory index indicate a moderate amount of segregation (H = 0.32), because the city population ¯ = 0.29). is somewhat more diverse (E = 0.42) than the average diversity of census tracts (E
21 Negative
values of H occur when E¯i is greater than E. (Recall that H = 1 −
conditions necessary for this to occur.
E¯i E
.) Earlier in this section, I explained the three
22 Census
tracts are geographic units defined by the Census Bureau. They are intended to approximate neighborhoods. Most studies of residential segregation use census tract data.
23 I
use census data for mutually exclusive race categories, combined with Hispanic or Latino ethnicity. The Hispanic category in Table 7 includes all individuals who identified Hispanic or Latino as their ethnicity, along with any category of race. The additional categories in Table 7 apply to individuals who identified as Not Hispanic or Latino.
19
Table 7: Population by Race and Ethnicity in Detroit, the Metro Area, and Michigan
Total Population White Black Hispanic Asian American Indian Pacific Islander Other Race Multiple Races
Detroit
Metro Area
Michigan
713,777 7.8% 82.2% 6.8% 1.0% 0.3% 0.0% 0.1% 1.7%
4,296,250 67.9% 22.6% 3.9% 3.3% 0.3% 0.0% 0.1% 1.9%
9,883,640 76.6% 14.0% 4.4% 2.4% 0.6% 0.0% 0.1% 1.9%
Table 8: White-Black Segregation in Detroit, the Metro Area, and Michigan Detroit
Metro Area
Michigan
0.42 0.29 0.32 0.14
0.81 0.33 0.59 0.48
0.62 0.28 0.55 0.34
Overall Entropy (E) Average Local Entropy (E¯i ) Information Theory Index (H) Divergence Index (D)
Table 9: Decomposition of White-Black Segregation in the Detroit Metro Area and Michigan (Proportion of Overall Segregation) Metro Area H Overall Segregation Between-Subareas Detroit Remainder Within-Subareas Detroit Remainder
1.00 0.63 0.13 0.50 0.37 0.05 0.32
Metro Area D 1.00 0.63 0.50 0.14 0.37 0.05 0.32
Michigan H 1.00 0.49 0.04 0.45 0.51 0.03 0.49
Michigan D 1.00 0.49 0.43 0.06 0.51 0.03 0.49
Table 10: White and Black Population in the Detroit Metro Area and Michigan Proportion White Metro Area Detroit Remainder Michigan Detroit Remainder
0.75 0.09 0.88 0.85 0.09 0.90
20
Proportion Black 0.25 0.91 0.12 0.15 0.91 0.10
Both indexes increase when segregation is measured for the metro area, and then decrease at the state level. To better understand how the measures differ, I decompose overall segregation in the metro area into segregation occurring between Detroit and the remainder of the metro area, and segregation occurring among the tracts within each these subareas. Likewise, I decompose overall segregation in Michigan into segregation occurring between Detroit and the remainder of the state, and segregation occurring among the tracts within each these subareas. Table 9 reports the results of the decomposition. It shows the proportion of the overall segregation attributable to each of the additive components. The decomposition of segregation in the metro area shows that about two-thirds of the segregation occurs between Detroit and the remainder of the metro area. (See Table 9.) Segregation among the tracts within each of these subarea accounts for the balance of the segregation (37%). This means that the largest differences in population composition and diversity occur at the aggregate level between Detroit and remainder of the metro area. There is comparatively less difference among the tracts within each subarea. Between-subarea segregation in the metro area is a sum two components, one for Detroit and one for the remainder of the metro area. Disaggregating these two components, we see that there is a stark difference between results for the two indexes. They show opposite results in terms of how much each subarea (Detroit and the remainder of the metro area) contributes to overall segregation in the metro area. The same pattern is evident in results for the state. D compares the difference between the subarea proportions and overall metro area proportions. The proportion White is 0.75 in the metro area, compared to 0.09 in Detroit and 0.88 in the remainder of the metro area. (See Table 10.) There is more divergence between Detroit’s population and the overall population than between the remainder of the metro area and the overall population. Greater divergence is represented as higher segregation in these results, as seen in Figure 2. Detroit contributes more to overall segregation than the remainder of the metro area, even after weighting each of segregation score by the subarea’s share of the metro population.24 Results for H show an opposite trend: Detroit contributes less to overall segregation than the remainder of the metro area. The reason is not different segregation scores for each subarea, as with D. Instead, it is due to the scores being weighted differently – each subarea’s share of the metro population, which is much smaller for Detroit than for the remainder of the metro area. H compares local diversity to overall diversity. The populations of Detroit and the remainder of the metro area have about the same level of diversity, and each has less diversity than the overall metro population. Different groups are over- and under-represented in each subarea, but unlike D, H is only concerned with the level of diversity, not the source of that diversity. Figure 2 shows the decomposed segregation between subareas in the Detroit metro area (Figure 2a) and Michigan (Figure 2b). The horizontal axis shows the proportion White, and the vertical axis shows the degree of segregation. The solid line shows the functional form of segregation measured with D, and the dashed line shows segregation measured with H. The points in each figure indicate the segregation scores of each subarea – the city of Detroit and the remainder of the metro area in Figure 2a and Detroit and the remainder of Michigan in Figure 2b. These are raw segregation scores for each subarea. In contrast, Table 9 reports the proportion of the overall segregation attributable to each of subareas, after the raw segregation scores are weighted by the subarea’s share of the overall population. In Figure 2, There is a pronounced difference between the segregation of each subarea when measured with D, but not with H. The segregation calculated with H is nearly the same for both subareas. The reason why the remainder of the metro area contributes more to overall segregation is because it has a larger population than Detroit, and their segregation scores are weighted according to their shares of the metro population.
24 The
Detroit population accounts for 17% of the metro area population and 7% of the state population. Detroit’s share of the population is the same if we include only the White and Black population in the metro area and state and exclude all other racial and ethnic groups.
21
Figure 2: Decomposition of White-Black Segregation Between Subareas in the Detroit Metro Area, and Michigan (a) Detroit Metro Area
(b) Michigan
●
●
● ●
−0.5
−0.5
● ●
City
●
Segregation 0.5 1.5
●
Remainder
2.5
Divergence Index (D) Information Theory Index (H)
Remainder
City
Segregation 0.5 1.5
2.5
Divergence Index (D) Information Theory Index (H)
0.00
0.25 0.50 0.75 Proportion White
1.00
0.00
0.25 0.50 0.75 Proportion White
1.00
Conclusion Comparing segregation results for D and H underscores the key differences between the indexes. They measure different dimensions of segregation. D measures evenness – how different is each component compared to the overall context? H measures diversity – how much diversity does each component contain relative the overall level of diversity? H treats any deviations from maximum diversity the same, and D is responsive to the specific proportions and groups that are over- or under-represented. Although H can be decomposed into additive components, it is problematic to interpret the results for each component as segregation. This was evident in the example of White-Black segregation in the Detroit region. Within the city of Detroit, census tracts are largely representative of the overall racial composition of the city. But there are stark differences in the racial composition of the city Detroit compared to the metro area and state. Detroit has a large majority of Black residents whereas the metro area and state populations are predominately White. Within the regional context, it seems apparent that Detroit is segregated, but results for segregation measured with H show quite the opposite. The dynamics of D and H seen in the Detroit example are relevant to many other social contexts as well. For example, consider studying gender segregation across academic majors at a university. If the student population 75% women and 25% men, and among engineering majors 25% are women and 75% are men, is the major segregated? Segregation measured in terms of evenness with D would find that the engineering major is segregated. The gender proportions in the major are unexpected given the university population – men are over-represented and women are under-represented relative to their proportions of the university population. Measured in terms of diversity with H, the engineering major is not segregated. The major has a 3 to 1 mix of genders, same as the university. The relative proportion of groups is treated the same no matter which group is over- or under-represented.25 It is misleading to conclude that the engineering major is not segregated within the university context. Likewise, it is misleading to conclude that Detroit is not segregated in the regional context, given the stark differences in population composition. Measures should be able to accurately represent the inequality and
25 In
Figure 1b, we can see that H = 0 when group 1 is 25% or 75% of the population, even though the overall population is 75% group 1.
22
segregation across contexts and nested levels of geography. This is a strength of the divergence index (D) and a limitation of the information theory index (H). The divergence index is a conceptually intuitive and methodologically rigorous measure of inequality and segregation. Unlike other measures, the comparative reference that inequality is evaluated against is not fixed, it can be specified in a theoretically meaningful way. A measure’s comparative reference defines a theoretical state of equality, and thus has important implications for our understanding of inequality. The divergence index improves upon existing measures of inequality and segregation by allowing researchers to choose a relevant comparative reference. The choice is transparent rather than buried in the mathematics that underly a measure. This makes it a particularly useful measure for comparing inequality over time, place, or cohort, across counterfactual scenarios, or against a normative standard.
23
References Alker, H. R. and B. M. Russett. 1964. “On Measuring Inequality.” Behavioral Science 9(3):207–18. Allison, Paul D. 1978. “Measures of Inequality.” American Sociological Review 43(6):865–80. Atkinson, A. B. 1970. “On the Measurement of Inequality.” Journal of Economic Theory 2(3):244–63. Atkinson, A. B. 1983. The Economics of Inequality. New York: Oxford University Press. Atkinson, A. B. 2008. “More on the Measurement of Inequality.” The Journal of Economic Inequality 6(3):277–83. Bavaud, François. 2009. “Information Theory, Relative Entropy and Statistics.” Pp. 54–78 in Formal theories of information. Berlin, Heidelberg: Springer Berlin Heidelberg. Bell, Wendell. 1954. “A Probability Model for the Measurement of Ecological Segregation.” Social Forces 32(4):357–64. Blau, Peter Michael. 1977. Inequality and Heterogeneity: A Primitive Theory of Social Structure. New York: Free Press. Bourguignon, Francois. 1979. “Decomposable Income Inequality Measures.” Econometrica 47(4):901–20. Breen, Richard and Leire Salazar. 2011. “Educational Assortative Mating and Earnings Inequality in the United States.” The American Journal of Sociology 117(3):808–43. Coleman, James S., T. Hoffer, and S. Kilgore. 1982. “Achievement and Segregation in Secondary-Schools - a Further Look at Public and Private School Differences.” Sociology of Education 55(2-3):162–82. Cortese, Charles F., R. Frank Falk, and Jack K. Cohen. 1976. “Further Considerations on the Methodological Analysis of Segregation Indices.” American Sociological Review 41(4):630–37. Coulter, Philip B. 1989. Measuring Inequality: A Methodological Handbook. Boulder : Westview Press. Cover, T. M. and Joy A. Thomas. 2006. Elements of Information Theory. Hoboken, N.J.: Wiley-Interscience. Cowell, Frank A. 1980a. “Generalized Entropy and the Measurement of Distributional Change.” European Economic Review 13(1):147–59. Cowell, Frank A. 1980b. “On the Structure of Additive Inequality Measures.” Review of Economic Studies 47(3):521–31. Cowell, Frank A. and K. Kuga. 1981. “Additivity and the Entropy Concept: An Axiomatic Approach to Inequality Measurement.” Journal of Economic Theory 25(1):131–43. Cowell, Frank A., Emmanuel Flachaire, and Sanghamitra Bandyopadhyay. 2013. “Reference distributions and inequality measurement.” Journal of Economic Inequality 11(4):421–37. Dalton, Hugh. 1920. “The Measurement of the Inequality of Incomes.” The Economic Journal 30(119):348–61. Duncan, Otis Dudley and Beverly Duncan. 1955. “A Methodological Analysis of Segregation Indexes.” American Sociological Review 20(2):210–17. Falk, R. Frank, Charles F. Cortese, and Jack K. Cohen. 1978. “Utilizing standardized indices of residential segregation: comment on Winship.” Social Forces 57:713. Farrell, Chad R. 2008. “Bifurcation, Fragmentation or Integration? The Racial and Geographical Structure of US Metropolitan Segregation, 1990–2000.” Urban Studies 45(3):467–99. Fischer, Mary J. 2008. “Shifting Geographies: Examining the Role of Suburbanization in Blacks’ Declining Segregation.” Urban Affairs Review 43(4):475–96. Fossett, Mark and Scott J. South. 1983. “The Measurement of Intergroup Income Inequality: A Conceptual Review.” Social Forces 61(3):855–71. Herfindahl, Orris C. 1950. “Concentration in the US Steel Industry.” PhD thesis, Columbia University. Hirschman, Albert O. 1945. National Power and the Structure of Foreign Trade. Berkeley: University of California Press.
24
Jahn, Julius A., Calvin F. Schmid, and Clarence C. Schrag. 1947. “The Measurement of Ecological Segregation.” American Sociological Review 12(3):293–303. James, David R. and Karl E. Taeuber. 1985. “Measures of Segregation.” Sociological Methodology 15:1–32. Kendall, Maurice G. and Alan Stuart. 1977. The Advanced Theory of Statistics. London: Griffin. Kullback, Solomon. 1987. “Letters to the Editor.” The American Statistician 41:338–41. Lieberson, S. 1980. A Piece of the Pie: Blacks and White Immigrants Since 1880. Berkeley: University of California Press. Lieberson, Stanley and Donna K. Carter. 1982. “Temporal Changes and Urban Differences in Residential Segregation: A Reconsideration.” American Journal of Sociology 88(2):296–310. Magdalou, Brice and Richard Nock. 2011. “Income Distributions and Decomposable Divergence Measures.” Journal of Economic Theory 146(6):2440–54. Massey, Douglas S. and Nancy A. Denton. 1988. “The Dimensions of Residential Segregation.” Social Forces 67(2):281–315. Morgan, B. S. 1975. “The Segregation of Socioeconomic Groups in Urban Areas: A Comparative Analysis.” Urban Studies 12:47–60. Morgan, Barrie S. and John Norbury. 1981. “Some Further Observations on the Index of Residential Differentiation.” Demography 18(2):251–56. Mori, T., K. Nishikimi, and T. E. Smith. 2005. “A Divergence Statistic for Industrial Localization.” The Review of Economics and Statistics 87(4):635–51. Page, Scott E. 2007. The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton: Princeton University Press. Page, Scott E. 2011. Diversity and Complexity. Princeton University Press. Pigou, A. C. 1912. Wealth and Welfare. London: Macmillan. R Core Team. 2014. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Reardon, Sean F. and Glenn Firebaugh. 2002. “Measures of Multigroup Segregation.” Sociological Methodology 32:33–67. Reardon, Sean F. and David O’Sullivan. 2004. “Measures of Spatial Segregation.” Sociological Methodology 34(1):121–62. Sakoda, James M. 1981. “A Generalized Index of Dissimilarity.” Demography 18(2):245–50. Schwartz, Joseph E. and Christopher Winship. 1980. “The Welfare Approach to Measuring Inequality.” Sociological Methodology 11:1–36. Sen, Amartya. 1973. On Economic Inequality. Oxford: Clarendon Press. Shannon, C. E. 1948. “A Mathematical Theory of Communication.” Bell System Technical Journal 27(3):379– 423. Shorrocks, A. F. 1980. “The Class of Additively Decomposable Inequality Measures.” Econometrica 48(3):613– 25. Shorrocks, Anthony F. 1984. “Inequality Decomposition by Population Subgroups.” Econometrica 52(6):1369– 85. Shorrocks, Anthony F. 2012. “Decomposition procedures for distributional analysis: a unified framework based on the Shapley value.” The Journal of Economic Inequality 11(1):99–126. Stewart, Quincy Thomas. 2006. “Reinvigorating Relative Deprivation: a New Measure for a Classic Concept.” Social Science Research 35(3):779–802. Taeuber, Karl E. and Alma F. Taeuber. 1965. Negroes in Cities: Residential Segregation and Neighborhood Change. Chicago Aldine Pub. Co.
25
Theil, Henri. 1967. Economics and Information Theory. Amsterdam: North Holland. Theil, Henri. 1972. Statistical Decomposition Analysis. edited by Henri Theil. Amsterdam: North-Holland Publishing Company. Theil, Henri and Anthony J. Finizza. 1971. “A Note on the Measurement of Racial Integration of Schools by Means of Informational Concepts.” The Journal of Mathematical Sociology 1(2):187–93. U.S. Census Bureau. 2011. “2010 Census Summary File 1—United States.” Vega, Casilda Lasso de la and Ana Urrutia. 2007. “The ‘Extended’ Atkinson family: The class of multiplicatively decomposable inequality measures, and some new graphical procedures for analysts.” The Journal of Economic Inequality 6(2):211–25. Waldman, Loren K. 1977. “Types and Measures of Inequality.” Social Science Quarterly 58(2):229–41. Walsh, J. A. and M. E. O’Kelly. 1979. “An Information Theoretic Approach to Measurement of Spatial Inequality.” Economic and Social Review 10:267–86. White, Michael J. 1986. “Segregation and Diversity Measures in Population-Distribution.” Population Index 52(2):198–221. Winship, Christopher. 1978. “The Desirability of Using the Index of Dissimilarity or Any Adjustment of It for Measuring Segregation: Reply to Falk, Cortese, and Cohen.” Social Forces 57(2):717–20. Zoloth, B. S. 1976. “Alternative Measures of School Segregation.” Land Economics.
26