A Family of Bounded Divergence Measures Based on The Bhattacharyya Coefficient Shivakumar Jolad,a,1,∗, Ahmed Romanb,2 , Mahesh Shastryc,3 , Ayanendranath Basud,1 a Indian
arXiv:1201.0418v6 [math.ST] 3 Jul 2013
Institute of Technology Gandhinagar, Ahmedabad, Gujarat -380005 , INDIA b Virginia Tech , Blacksburg, VA -24061, USA c The Pennsylvania State University, University Park, PA-16803, USA d Indian Statistical Institute, Barrackpore Trunk Road, Kolkata, West Bengal-700108, INDIA
Abstract Divergence measures are widely used in various applications of pattern recognition, signal processing and statistical applications. In this paper, we introduce a new one parameter family of divergence measures, called bounded Bhattacharyya distance (BBD) measures, for quantifying the dissimilarity between probability distributions. These measures are bounded, symmetric and positive semi-definite. Unlike the Kullback-Leibler divergence, BBD measures do not require probability density functions to be absolutely continuous with respect to each other. In the asymptotic limit, BBD measure approach squared Hellinger distance. A generalized BBD measure for multiple distributions is also introduced. We prove an extension of a theorem of Bradt and Karlin for BBD relating Bayes error probability and Divergence ranking. We show that BBD belongs to the class of generalized Csiszar f-divergence and derive some properties such as curvature and relation to Fisher’s Information. For distributions with vector valued parameters, the curvature matrix can be used to obtain the Rao geodesic distance. We also derive certain inequalities between BBD and well known measures such as Hellinger and Jensen-Shannon divergence. Bounds on the Bayesian error probability are established with BBD measure. Keywords: divergence measures, pattern recognition, signal detection, signal classification, Bhattacharyya distance, f-divergence, error probability.
1. Introduction Divergence measures for the distance between two probability distributions have been extensively studied in the last six decades [1, 20–23]. These measures are widely used in varied fields such as pattern recognition [2, 3, 10], signal detection [17, 18], Bayesian model validation [33] and quantum information theory [24, 27]. Distance measures ∗ Corresponding
author Email addresses:
[email protected] (Shivakumar Jolad, ),
[email protected] (Ahmed Roman),
[email protected] (Mahesh Shastry ),
[email protected] (Ayanendranath Basu ) URL: http://www.iitgn.ac.in (Shivakumar Jolad, ), http://www.ee.psu.edu (Mahesh Shastry ) 1 Part of the work was done by the corresponding author at Virginia Tech, USA 2 Department of Mathematics, Virginia Tech 3 Currently at 3M, Minneapolis, Minnesota, USA Preprint submitted to Pattern Recognition
try to achieve two main objectives (which are not mutually exclusive): to assess (1) how “close” two distributions are compared to others and (2) how “easy” it is to distinguish between one pair than the other [1]. There are plethora of distance measures available to assess the convergence (or divergence) of probability distributions. Many of these measures are not metrics in the strict sense, as they may not satisfy either the symmetry of arguments or the triangle inequality. In applications, the choice of the measure depends on the interpretation of the metric in terms of the problem considered, its analytical properties and ease of computation [14]. One of the most well-known and widely used divergence measures, the Kullback-Leibler divergence (KLD)[21, 22], can create problems in specific applications. Specifically, it is unbounded above and requires that the distributions be ‘absolutely continDecember 21, 2013
uous’ with respect to each other. Various other information theoretic measures have been introduced keeping in view ease of computation ease and utility in problems of signal selection and pattern recognition. Of these measures, Bhattacharyya distance [5, 18, 26] and Chernoff distance [2, 9, 26] have been widely used in signal processing. However, these measures are again unbounded from above. Many bounded divergence measures such as Variational, Hellinger distance [2, 13] and Jensen-Shannon metric [8, 25, 30] have been studied extensively. Utility of these measures varies depending on properties such as tightness of bounds on error probabilities, information theoretic interpretation, and generalization to multiple probability distributions. Here we introduce a new one parameter (α) family of bounded measures based on the Bhattacharyya coefficient, called bounded Bhattacharyya distance (BBD) measures. These measures are symmetric, positive-definite and bounded between 0 and 1. In the asymptotic limit (α → ±∞) they approach squared Hellinger divergence [15, 19]. Following Rao [30] and Lin [25], a generalized BBD is introduced to capture the divergence (or convergence ) between multiple distributions. We show that BBD measures belong to the generalized class of f-divergences and inherits many of its properties such as curvature and its relation to Fisher’s Information. We prove an extension of Bradt-Karlin theorem for BBD, which proves the existence of prior probabilities relating Bayes error probabilities with ranking based on divergence measure. Bounds on the error probabilities Pe can be calculated through BBD measures using certain inequalities between Bhattacharyya coefficient and Pe . We derive two inequalities for a special case of BBD (α = 2) with Hellinger and Jensen-Shannon divergences. Divergence measures can be used in statistics to calculate minimum disparity indicators. We discuss the possibility of using BBD minimum disparity estimators in such applications. Our paper is organized as follows: Section I is the current introduction. In Section II, we recall the well known Kullback-Leibler and Bhattacharyya divergence measures, and then introduce our bounded Bhattacharyya distance measures. We discuss some special cases of BBD, in particular Hellinger distance. We also introduce the generalized BBD for multiple distributions. In Section III, we derive several interesting properties of our measure such as positive semi-definiteness, derive extension of Bradt-Karl theorem and show that BBD belongs
to extended f-divergence class. Derive the relation between curvature with Fisher’s Information, and curvature metric. We also derive some inequalities with other measures. In Section IV, we discuss BBD and minimum disparity estimators. In the Appendix we provide the expressions for BBD measures , with α = 2, for some commonly used distributions. We conclude the paper with summary and outlook. 2. Divergence measures In the following subsection we consider a measurable space Ω with σ algebra B and the set of all probability measures M on (Ω, B). Let P and Q denote probability measures on (Ω, B) with p and q denoting their densities with respect to a common measure λ. We recall the definition of absolute continuity [32]: Absolute Continuity A measure P on the Borel subsets of the real line is absolutely continuous with respect to Lebesgue measure Q, if P (A) = 0, for every Borel subset A ∈ B for which Q(A) = 0, and is denoted by P 1 (α < 0) in ρ, as seen by evaluating second derivative
(6)
It is easy to see that Bα (0) = 1, Bα (1) = 0.
∂ 2 Bα (ρ) ∂ρ2
2.4. Special cases (i) For α = 2 we get,
= α2 (
0.6 ρ
2.3. Bounded Bhattacharyya Distance Measures In many applications, in addition to the desirable properties of the Bhattacharyya distance, boundedness is required. We propose a new family of bounded measure of Bhattacharyya distance as below, Bψ,b (P, Q) ≡ − logb (ψ(ρ)) (3)
which can be simplified as h i log 1 − (1−ρ) α . Bα (ρ) = log 1 − α1
0.2
2
=
1+ρ = − log22 2 1+ρ = − log2 . (7) 2 We denote the above measure as ζBBD and study some of its special properties in Sec.3.7. B2 (ρ)
log 1 −
>0 1 α1 (ρ) ≤ H 2 (ρ) ≤ Bα Dβ 0 (P1 , P2 ).
1−ρ
Bαβ (ρβ )
log(1 − α β ) = log(1 − 1/α)
(15)
Note that, 0 ≤ ρβ ≤ 1 and 0 ≤ Bαβ ≤ 1, since the weighted geometric mean is maximized when all the pi ’s are the same, and minimized when any two of the probability densities pi ’s are perpendicular to each other.
In general it is not true that Dβ (P1 , P2 ) > Dβ 0 (P1 , P2 ) =⇒ Pe (β) < Pe (β 0 ). Bradt and Karlin proved the following theorem relating error probabilities and divergence ranking for symmetric Kullback Leibler divergence J:
3. Properties Theorem 3.2 (Bradt and Karlin [7]). If Jβ (P1 , P2 ) > Jβ 0 (P1 , P2 ), then ∃ a set of prior probabilities Γ = {π1 , π2 } for two hypothesis g1 , g2 , for which Pe (β, Γ) < Pe (β 0 , Γ) (18)
3.1. Symmetry, Boundedness and Positive Semidefiniteness Theorem 3.1. Bα (P, Q) is symmetric, positive semi-definite and bounded in the interval [0, 1] for α ∈ [−∞, 0) ∪ (1, ∞].
where Pe (β, Γ) is the error probability with parameter β and prior probability Γ.
Proof. Symmetry: Since Bα (P, Q) = Bα (ρ(P, Q)), and ρ(P, Q) = ρ(Q, P ), it follows that
It is clear that the theorem asserts existence, but no method of finding these prior probabilities. Kailath [18] proved the applicability of BradtKarlin Theorem for Bhattacharyya distance measure. We follow the same route and show that the Bα (ρ) measure satisfies a similar property using the following theorem by Blackwell.
Bα (P, Q) = Bα (Q, P ). Positive-semidefinite and boundedness: Bα (0) = 1, Bα (1) = 0 and
Since
∂Bα (ρ) 1 = Bα (ρ(β 0 )), or equivalently ρ(β) < ρ(β 0 ) then ∃ a set of prior probabilities Γ = {π1 , π2 } for two hypothesis g1 , g2 , for which Pe (β, Γ) < Pe (β 0 , Γ). (19) Proof. The √proof closely follows Kailath [18]. First note that L is a concave function of L (likelihood ratio) , and Xp p1 (x, β)p2 (x, β) ρ(β) =
f-divergence [2] Consider a measurable space Ω with σ algebra B. Let λ be a measure on (Ω, B) such that any probability laws P and Q are absolutely continuous with respect to λ, with densities p and q. Let f be a continuous convex real function on R+ , and g be an increasing function on R. The class of divergence coefficients between two probabilities: Z p qdλ (25) d(P, Q) = g f q Ω
x∈X
s =
X x∈X
=
p1 (x, β) p2 (x, β) p2 (x, β)
p Eβ [ Lβ |g (2) ].
(20)
Similarly p ρ(β 0 ) = Eβ 0 [ Lβ 0 |g (2) ] Hence, ρ(β) < ρ(β 0 ) ⇒ p p Eβ [ Lβ |g (2) ] < Eβ 0 [ Lβ 0 |g (2) ].
(21)
are called the f-divergence measure w.r.t. functions (f, g) . Here p/q = L is the likelihood ratio. The term in the parenthesis of g gives the Csiszar’s [11, 12] definition of f-divergence.
(22)
Suppose assertion of the stated theorem is not true, then for all Γ, Pe (β 0 , Γ) ≤ Pe (β, Γ). Then by Theorem 3.3, Eβ [Φ(Lβ )|g (2) ] ≤ Eβ [Φ(Lβ )|g (2) ] which contradicts our result in Eq. 22. 3.3. Bounds on Error Probability Error probabilities are hard to calculate in general. Tight bounds on Pe are often extremely useful in practice. Kailath [18] has shown bounds on Pe in terms of the Bhattacharyya coefficient ρ: i p √ 1h 1 2π1 − 1 − 4π1 π2 ρ2 ≤ Pe ≤ π1 − + π1 π2 ρ, 2 2 (23) with π1 + π2 = 1. If the priors are equal π1 = π2 = 1 2 , the expression simplifies to i p 1 1h 1 − 1 − ρ2 ≤ Pe ≤ ρ. (24) 2 2 Inverting relation in Eq. 6 for ρ(Bα ), we can get the bounds in terms of Bα (ρ) measure. For the
The Bα (P, Q) , for α ∈ (1, ∞] measure can be written as the following f divergence: √ 1− x log(−F ) f (x) = −1 + , g(F ) = , (26) α log(1 − 1/α) where, F
= = =
r Z 1 p −1 + 1− qdλ α q Ω Z 1√ 1 − q −1 + pq dλ α α Ω 1−ρ −1 + . α
(27)
and g(F ) = 5
log(1 − 1−ρ α ) = Bα (P, Q). log(1 − 1/α)
(28)
3.5. Curvature and Fisher’s Information
where If (θ) is the Fisher Information of distribution f (x|θ)
In statistics, the information that an observable random variable X carries about an unknown parameter θ (on which it depends) is given by the Fisher information. One of the important properties of f-divergence of two distributions of the same parametric family is that their curvature measures the Fisher information. Following the approach pioneered by Rao [29], we relate the curvature of BBD measures to the Fisher information and derive the differential curvature metric. The discussions below closely follow DasGupta [13].
where C(α) =
(29)
1.
∂ 2 ρ(θ, φ) ∂φ2 φ=θ
=
= =
dZθ
(31)
and its derivatives: Z ∂ρ(θ, φ) 1 ∂ = f (x|θ)dx = 0, ∂φ 2 ∂θ φ=θ
(34)
(35)
> 0.
(φ − θ)2 C(α)If (θ). 2
(36)
we can easily show that
Let us observe some properties of Bhattacharyya coefficient ρ(θ, φ) =
dx.
3.6. Differential Metrics Rao [31] generalized the Fisher information to multivariate densities with vector valued parameters to obtain a “geodesic” distance between two parametric distributions Pθ , Pφ of the same family. We derive such a metric for BBD measure using property of f-divergence. Let θ, φ ∈ Θ ⊆ Rp , then using the fact that ∂Z(θ, φ) = 0, (37) ∂θi φ=θ
d Zθ (φ) dφ φ=θ 2 2 (φ − θ) d Zθ (φ) + . . .(30) + 2 dφ2 φ=θ
= ρ(φ, θ)
−1 4α log(1−1/α)
Bα (θ, φ) ∼
= Zθ (θ) + (φ − θ)
ρ(θ, θ)
2
The leading term of Bα (θ, φ) is given by
Proof. Expand Zθ (φ) around theta
ρ(θ, φ)
∂ log f (x|θ) ∂θ
Zθ (θ) = 1 ∂Zθ (φ) = 0 ∂φ φ=θ ∂ 2 Zθ (φ) = C(α)If (θ) > 0 ∂φ2 φ=θ
Theorem 3.5. Curvature of Zθ (φ)|φ=θ is the Fisher information of f (x|θ) up to a multiplicative constant.
Zθ (φ)
f (x|θ)
Using the above relationships, we can write down the terms in the expansion of Eq. 30
Definition Let {f (x|θ); θ ∈ Θ ⊆ R}, be a family of densities indexed by real parameter θ, with some regularity conditions (f (x|θ) is absolutely continuous). ) log(1 − 1−ρ(θ,φ) α = Zθ (φ) Bα (θ, φ) = log(1 − 1/α) Rp f (x|θ)f (x|φ)dx where ρ(θ, φ) =
Z If (θ) =
=
=
p X ∂ 2 Zθ dθi dθj + . . . , ∂θi ∂θj i,j=1 p X
gij dθi dθj + . . . .
(38)
i,j=1
The curvature metric gij can be used to find the geodesic on the curve η(t), t ∈ [0, 1] with
(32)
C = η(t) :
2 Z −1 1 ∂f dx 4 f (x|θ) ∂θ Z 1 ∂2 + f (x|θ)dx 2 ∂θ2 2 Z 1 ∂ log f (x|θ) − f (x|θ) dx 4 ∂θ 1 − If (θ). (33) 4
η(0) = θ η(1) = φ.
(39)
Details of the geodesic equation are given in many standard differential geometry books. In the context of probability distance measures reader is referred to (see 15.4.2 in A DasGupta [13] for details) The curvature metric of all Csiszar f-divergences are just scalar multiple KLD measure [2, 13] given by: f gij (θ) = f 00 (1)gij (θ).
6
(40)
For our BBD measure √ 00 1− x 1 f 00 (x) = −1 + = α 4αx3/2 f˜00 (1) = 1/4α. (41)
Jensen-Shannon Divergence: The Jensen difference between two distributions P1 , P2 , with densities p1 (x), p2 (x) and weights (λ1 , λ2 ); λ1 +λ2 = 1, is defined as, Jλ1 ,λ2 (P1 , P2 ) = H(λ1 p1 +λ2 p2 )−λ1 H(p1 )−λ2 H(p2 ). (47) Jensen-Shannon divergence (JSD) [8, 25, 30] is based on the Jensen difference and is given by:
Apart from the −1/ log(1− α1 ), this is same as C(α) in Eq. 36. It follows that the geodesic distance for our metric is same KLD geodesic distance up to a multiplicative factor. KLD geodesic distances are tabulated in DasGupta [13].
JS(P, Q)
3.7. Relation to other measures Here we focus on the special case α = 2, i.e. B2 (ρ) ζ(P, Q) = B2 (ρ(P, Q)) (42)
The structure and goals of JSD and BBD measures are similar. The following theorem compares the two metrics using Jensen’s inequality.
Theorem 3.6. ζ ≤ H 2 ≤ log 4 ζ
(43)
Lemma 3.7. Jensen’s Inequality: For a convex function ψ, E[ψ(X)] ≥ ψ(E[X]).
where 1 and log 4 are sharp. Proof. Sharpest upper bound is achieved via taking 2 (ρ) . Define supρ∈[0,1) Hζ(ρ) g(ρ) ≡
1−ρ . − log2 (1 + ρ)/2
Theorem 3.8 (Relation to Jensen-Shannon measure). JS(P, Q) ≥ log2 2 ζ(P, Q) − log 2
(44)
We use the un-symmetrized Jensen-Shannon metric for the proof.
We note that g(ρ) is continuous and has no singularities whenever ρ ∈ [0, 1). Hence g 0 (ρ) =
1−ρ 1+ρ
+ log( 1+ρ 2 )
log2
ρ+1 2
J1/2,1/2 (P, Q) Z 2p(x) 1 h p(x) log = 2 p(x) + q(x) i 2q(x) +q(x) log dx p(x) + q(x) . (48) =
Proof. Z
JS(P, Q)
log 2 ≥ 0.
It follows that g(ρ) is non-decreasing and hence supρ∈[0,1) g(ρ) = limρ→1 g(ρ) = log(4). Thus H 2 /ζ ≤ log 4.
(45)
Combining this with convexity property of Bα (ρ) for α > 1, we get
2p(x) dx p(x) + q(x) p Z p(x) + q(x) p = −2 p(x) log dx 2p(x) p p Z p(x) + q(x) p ≥ −2 p(x) log dx 2p(x) √ √ √ ( since p + q ≤ p + q) " # p p p(X) + q(X) p = EP −2 log 2p(X) =
p(x) log
ζ ≤ H 2 ≤ log 4 ζ By Jensen’s inequality E[− log f (X)] ≥ − log E[f (X)], we have " # p p p(X) + q(X) p EP −2 log ≥ 2p(X) "p # p p(X) + q(X) p . −2 log EP 2p(X)
Using the same procedure we can prove a generic version of this inequality for α ∈ (1, ∞] , given by 1 Bα (ρ) ≤ H 2 ≤ −α log 1 − Bα (ρ) (46) α
7
References
Hence, p p p(x) + q(x) p dx JS(P, Q) ≥ −2 log p(x) 2p(x) ! Rp 1+ p(x)q(x) = −2 log − log 2 2 ζ(p(x), q(x)) = 2 − log 2 log 2 2 = ζ(P, Q) − log 2. (49) log 2 Z
4. Summary and Outlook In this work we have introduced a new family of bounded divergence measures based on the Bhattacharyya distance. We have shown that it belongs to the class of generalized f-divergences and shares all its properties, such as those relating Fishers Information and curvature metric. We have discussed several special cases of our measure, in particular squared Hellinger distance, and studied relation with other measures such as Jensen-Shannon divergence. We have also shown the applicability of Bradt and Karl theorem on error probabilities. Ours measure is based on the Bhattacharyya coefficient which is useful in computing tight bounds on Bayes error probabilities. Although many bounded divergence measures have been studied and used in various applications, no single ‘metric’ is useful in all types of problems studied. Our measure with a tunable parameter α, can be useful in many practical applications where extremum values are desired such as minimal error, minimal false acceptance/rejection ratio etc. We leave such problems to future studies.
5. Acknowledgements One of us (S.J) thanks Rahul Kulkarni for insightful discussions, and acknowledge the financial support in part by grants DMR-0705152 and DMR1005417 from the US National Science Foundation. M.S. would like to thank the Penn State Electrical Engineering Department for support. 8
[1] Ali, S. M., Silvey, S. D., 1966. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society. Series B (Methodological) 28 (1), 131–142. [2] Basseville, M., 1989. Distance measures for signal processing and pattern recognition. Signal processing 18, 349–369. [3] Ben-Bassat, M., 1978. f-entropies, probability of error, and feature selection. Information and Control 39 (3), 227–242. [4] Bhattacharyya, A., 1943. On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc 35 (99-109), 4. [5] Bhattacharyya, A., 1946. On a measure of divergence between two multinomial populations. Sankhy˜ a: The Indian Journal of Statistics (1933-1960) 7 (4), 401–406. [6] Blackwell, D., 1951. Comparison of experiments. In: Second Berkeley Symposium on Mathematical Statistics and Probability. Vol. 1. pp. 93–102. [7] Bradt, R., Karlin, S., 1956. On the design and comparison of certain dichotomous experiments. The Annals of mathematical statistics, 390–409. [8] Burbea, J., Rao, C. R., may 1982. On the convexity of some divergence measures based on entropy functions. IEEE Transactions on Information Theory 28 (3), 489 – 495. [9] Chernoff, H., 1952. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics 23 (4), pp. 493–507. [10] Choi, E., Lee, C., Aug 2003. Feature extraction based on the Bhattacharyya distance. Pattern Recognition 36 (8), 1703–1709. [11] Csiszar, I., 1967. Information-type distance measures and indirect observations. Stud. Sci. Math. Hungar 2, 299–318. [12] Csiszar, I., 1975. I-divergence geometry of probability distributions and minimization problems. The Annals of Probability 3 (1), pp. 146–158. [13] DasGupta, A., 2011. Probability for Statistics and Machine Learning. Springer Texts in Statistics. Springer New York. [14] Gibbs, A., Su, F., 2002. On choosing and bounding probability metrics. International Statistical Review 70 (3), 419–435. [15] Hellinger, E., 1909. Neue begr¨ undung der theorie quadratischer formen von unendlichvielen ver¨ anderlichen. Journal f¨ ur die reine und angewandte Mathematik (Crelle’s Journal) 1909 (136), 210–271. [16] Hellman, M. E., Raviv, J., 1970. Probability of Error, Equivocation, and the Chernoff Bound. IEEE Transactions on Information Theory 16 (4), 368–372. [17] Kadota, T., Shepp, L., 1967. On the best finite set of linear observables for discriminating two gaussian signals. IEEE Transactions on Information Theory 13 (2), 278–284. [18] Kailath, T., Feb. 1967. The Divergence and Bhattacharyya Distance Measures in Signal Selection. IEEE Transactions on Communications 15 (1), 52–60. [19] Kakutani, S., 1948. On equivalence of infinite product measures. The Annals of Mathematics 49 (1), 214–224.
[20] Kapur, J., 1984. A comparative assessment of various measures of directed divergence. Advances in Management Studies 3 (1), 1–16. [21] Kullback, S., 1968. Information theory and statistics. New York: Dover, 1968, 2nd ed. 1. [22] Kullback, S., Leibler, R. A., 1951. On information and sufficiency. The Annals of Mathematical Statistics 22 (1), pp. 79–86. [23] Kumar, U., Kumar, V., Kapur, J. N., 1986. Some normalized measures of directed divergence. International Journal of General Systems 13 (1), 5–16. [24] Lamberti, P. W., Majtey, A. P., Borras, A., Casas, M., Plastino, A., 2008. Metric character of the quantum Jensen-Shannon divergence . Physical Review A 77, 052311. [25] Lin, J., jan 1991. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory 37 (1), 145 –151. [26] Nielsen, F., Member, S., Boltz, S., 2011. The BurbeaRao and Bhattacharyya centroids. Statistics XX (X), 1–12. [27] Nielsen, M., Chuang, I., 2000. Quantum computation and information. Cambridge University Press, Cambridge, UK 3 (8), 9. [28] Rao, C., 1982. Diversity: Its measurement, decomposition, apportionment and analysis. Sankhya: The Indian Journal of Statistics, Series A, 1–22. [29] Rao, C. R., 1945. Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–91. [30] Rao, C. R., 1982. Diversity and dissimilarity coefficients: A unified approach. Theoretical Population Biology 21 (1), 24 – 43. [31] Rao, C. R., 1987. Differential metrics in probability spaces. Differential geometry in statistical inference 10, 217–240. [32] Royden, H., 1986. Real analysis. Macmillan Publishing Company, New York. [33] Tumer, K., Ghosh, J., 1996. Estimating the Bayes error rate through classifier combining. Proceedings of 13th International Conference on Pattern Recognition, 695– 699.
9
• Pareto : Assuming the same cut off xm ,
6. Appendix
( αp αp xxαm p +1 P (x) = 0
6.1. ζ measures of some common distributions • Binomial :
P (k) = Q(k) =
n k k p (1 n k k q (1
( αq αq xxαm q +1 Q(x) = 0
n−k
− p) , − q)n−k .
! p √ 1 + [ pq + (1 − p)(1 − q)]n . 2 (50)
ζbin (P, Q) = − log2
• Poisson :
P (k) =
−λp λk pe k!
, Q(k) =
−λq λk qe k!
1 + e−(
ζpoisson (P, Q) = − log2
√
λp −
.
√
λq )2 /2
! .
2 (51)
• Gaussian : P (x)
=
Q(x)
=
1 (x − xp )2 √ exp − , 2σp2 2πσp (x − xq )2 1 √ exp − . 2σq2 2πσq
ζGauss (P, Q)
=
h 2σp σq 1 − log2 1 + 2 σp + σq2 (xp − xq )2 i exp − 4(σp2 + σq2 ) .
(52)
• Exponential : P (x) = λp e−λp x , Q(x) = λq e−λq x . " p # p ( λ p + λ q )2 . (53) ζexp (P, Q) = − log2 2(λp + λq ) 10
f orx ≥ xm if x < xm ,
(54)
f or x ≥ xm if x < xm .
(55)
" √ # √ ( αp + αq )2 ζpareto (P, Q) = − log2 . 2(αp + αq ) (56)