Uncertain hypothesis testing for two experts' empirical data

Report 2 Downloads 51 Views
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Author's personal copy Mathematical and Computer Modelling 55 (2012) 1478–1482

Contents lists available at SciVerse ScienceDirect

Mathematical and Computer Modelling journal homepage: www.elsevier.com/locate/mcm

Uncertain hypothesis testing for two experts’ empirical data Xiaosheng Wang ∗ , Zhichao Gao, Haiying Guo College of Science, Hebei University of Engineering, Handan 056038, China

article

info

Article history: Received 26 March 2011 Received in revised form 9 August 2011 Accepted 24 October 2011 Keywords: Uncertainty theory Uncertainty distribution Uncertain statistics Hypothesis test

abstract Uncertain statistics is a methodology for collecting and interpreting experts’ experimental data by using uncertainty theory. Based on empirical uncertainty distributions, this paper will present a statistical method of uncertain hypothesis testing to detect whether two uncertainty distributions are equal. © 2011 Elsevier Ltd. All rights reserved.

1. Introduction Uncertainty theory, a branch of mathematics based on normality, self-duality, countable subadditivity, and product measure axioms, was founded by Liu [1] and refined by him later [2] to model human imprecise quantities such as ‘‘about 1000 km’’, ‘‘roughly 60 kg’’, ‘‘high speed’’, and ‘‘small size’’ which are neither random nor fuzzy in nature. Based on Liu’s uncertainty theory, some basic theoretical work of uncertainty theory such as uncertain process [3], uncertain calculus [4], uncertain differential equation [4], uncertain logic [5], uncertain inference [6], and uncertain risk analysis [7] have been studied. Meanwhile, as an application of uncertainty theory, Liu [8] introduced uncertain programming and applied uncertain programming to system reliability design, facility location problems, vehicle routing problems, project scheduling problems, and so on. Other references related to uncertainty theory are [9–14]. To explore the recent developments of uncertainty theory, readers may consult Liu’s book [2]. One important issue in uncertainty theory is how to determine the uncertainty distribution of an uncertain variable. In order to answer this question, uncertain statistics was presented by Liu [2], which is a methodology for collecting and interpreting an expert’s experimental data by uncertainty theory. In uncertain statistics, Liu [2] suggested an empirical uncertainty distribution and proposed a principle of least squares as the method for estimating the unknown parameters based on the expert’s experimental data. Later, Wang and Peng [15] proposed a method of moments for estimating the unknown parameters. For multiple domain experts’ data, Wang et al. [16] and Gao [17] independently recast the Delphi method to determine the uncertainty distribution. Recently, Guo et al. [18] presented an uncertainty linear regression model based on multivariate uncertainty distribution theory. The goal of this paper is to build a method of uncertain hypothesis testing to detect whether two uncertainty distributions are equal. This method depends on the empirical uncertainty distribution and the rank of the experts’ experimental data. The remainder of this paper is organized as follows. Section 2 introduces some concepts in uncertainty theory as they are needed. Some basic concepts of uncertain statistics are introduced in Section 3. The uncertain hypothesis testing and some examples are proposed in Section 4. Finally, a conclusion is drawn in Section 5.



Corresponding author. E-mail addresses: [email protected] (X. Wang), [email protected] (Z. Gao).

0895-7177/$ – see front matter © 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.mcm.2011.10.039

Author's personal copy X. Wang et al. / Mathematical and Computer Modelling 55 (2012) 1478–1482

1479

2. Preliminaries In this section, we introduce some useful definitions about uncertain measures, uncertain variables and uncertainty distributions, which are three fundamental concepts in uncertainty theory. The uncertain measure M was defined by Liu [1] as a set function satisfying the normality, self-duality, countable subadditivity and product measure axioms. The concept of an uncertain variable ξ was introduced by Liu [1] as a measurable function from an uncertainty space (Γ , L, M) to the set of real numbers, where Γ is a nonempty set, L is a σ -algebra over Γ , and M(Λ) is a number to indicate the belief degree that the event Λ will occur (for any Λ ∈ L). Then the expected value operator of an uncertain variable ξ was defined by Liu [1] as +∞





0

M{ξ ≤ x}dx,

M{ξ ≥ x}dx −

E [ξ ] = 0

−∞

provided that at least one of the two integrals is finite. The kth moment of an uncertain variable ξ is defined by E [ξ k ], where k is a positive integer. For any x ∈ ℜ, the function Φ (x) = M {ξ ≤ x} is called the uncertainty distribution of an uncertain variable ξ . Peng and Iwamura [13] presented a sufficient and necessary condition of an uncertainty distribution. That is, a function Φ : ℜ → [0, 1] is an uncertainty distribution if and only if it is an increasing function except when Φ (x) ≡ 0 and Φ (x) ≡ 1. Moreover, if the inverse function Φ −1 (α) exists and is unique for each α ∈ (0, 1), then Φ (x) is called regular, and the inverse function Φ −1 (α) is called the inverse uncertainty distribution of ξ . 3. Development of uncertain statistics In order to determine the probability distribution, classical mathematical statistics was proposed as a methodology for collecting and interpreting the test data with correlative information in a system using probability theory. In addition, in order to determine the membership function, fuzzy statistics was presented, including fuzzy point estimation by a fuzzy decision making approach [19], fuzzy point estimation based on the notion of a fuzzy information system [20,21], fuzzy interval estimation [22], fuzzy hypothesis testing based on a model represented by fuzzy events [23], fuzzy regression [24,25], fuzzy Bayesian statistics by a fuzzy probability measure [26], fuzzy-Bayes decision rule [27], and so on. In order to determine an uncertainty distribution, uncertain statistics was defined by Liu [2]. Uncertain statistics is based on an expert’s experimental data rather than historical data. One question is how to obtain the expert’s experimental data. Liu [2] designed a questionnaire survey for collecting an expert’s experimental data. That is, we invite one or more domain experts who are asked to complete a questionnaire about the meaning of an uncertain variable ξ (for example ‘‘about 1000 km’’). We first ask the domain expert to choose a possible value x that the uncertain variable ξ may take. And then, we ask ‘‘how likely is it that ξ is less than x?’’ We denote the belief degree by α . Thus we obtain an expert’s experimental data (x, α) from the domain expert. Repeating the above process, we obtain the expert’s experimental data. Let (x1 , α1 ), (x2 , α2 ), . . . , (xn , αn ) be the expert’s experimental data that satisfy the following condition: x1 < x2 < · · · < xn ,

0 ≤ α1 ≤ α2 · · · ≤ αn ≤ 1.

(1)

Based on the above data, Liu [2] presented the following empirical uncertainty distribution:

 0,    (αi+1 − αi )(x − xi ) , Φ (x) = αi +  xi+1 − xi   1,

if x < x1 if xi ≤ x ≤ xi+1 , 1 ≤ i < n

(2)

if x > xn .

Assume that there are m domain experts and that each produces an uncertainty distribution. Then Liu [2] proposed a comprehensive uncertainty distribution,

Φ (x) = w1 Φ1 (x) + w2 Φ2 (x) + · · · + wm Φm (x), where w1 , w2 , . . . , wm are convex combination coefficients representing the weights of the domain experts. In addition, Wang et al. [16] and Gao [17] proposed a Delphi method to determine a comprehensive uncertainty distribution respectively. Assume that then uncertainty distribution to be determined has a known functional form with one or more unknown parameters such as Φ (x; θ1 , θ2 , . . . , θp ), where θ1 , θ2 , . . . , θp are unknown parameters. How do we estimate those unknown parameters? Liu [2] presented the principle of least squares, which says that the unknown parameters θi , i = 1, 2, . . . , p are the solution of the following minimization problem: min

θ1 ,...,θp

n − (Φ (xi ; θ1 , θ2 , . . . , θp ) − αi )2 . i=1

Author's personal copy 1480

X. Wang et al. / Mathematical and Computer Modelling 55 (2012) 1478–1482

Recently, Wang and Peng [15] proposed a method of moments to estimate the unknown parameters of an uncertainty distribution. This method indicates that the estimates of the unknown parameters are the solution of the system of equations +∞



xk−1 (1 − Φ (x; θ1 , θ2 , . . . , θp ))dx = ξ¯k ,

k

k = 1, 2, . . . , p,

0

where ξ¯k k = 1, 2, . . . , p are empirical moments determined by

ξ¯k = α1 xk1 +

n−1 − k − (αi+1 − αi )xji xki+−1j + (1 − αn )xkn .

1

k + 1 i=1 j=0

4. Hypothesis testing In classical statistics, hypothesis testing is the process of inferring from a sample whether or not to accept a certain statement about the problem. The statement itself is called hypothesis. In each case, the hypothesis can be tested on the basis of the evidence contained in the sample. The hypothesis is either rejected, meaning that the evidence from the sample casts sufficient doubt on the hypothesis for us to say with some degree of confidence that the hypothesis is false, or accepted, meaning that it cannot be rejected. In the real world, for one uncertain variable, two experts may give their respective views. We need to measure if their views are different or not. How do we model this? In this section, a method of uncertain hypothesis testing based on uncertainty theory is proposed to detect whether two uncertainty distributions are equal or not. 4.1. Experts’ data Let ξ be an uncertain variable. Assume that

(x11 , α11 ), (x12 , α21 ), . . . , (x1m , αm1 ),

(3)

(x21 , α12 ), (x22 , α22 ), . . . , (x2n , αn2 )

(4)

are two sets of experimental data from experts that meet the following conditions: x11 < x12 < · · · < x1m , x21


n for obtaining more information). We rank the above numbers x1i , x2j and αi1 , αj2 , i = 1, 2, . . . , s, j = 1, 2, . . . , t in increasing order. For example, we obtain two finite sequences as follows: AABAB · · · B

(7)

ABBAA · · · B,

(8)

where (7) consists of , and (8) consists of α , α , i = 1, 2, . . . , s, j = 1, 2, . . . , t, and the letters A and B represent the numbers coming from the empirical uncertainty distributions Φ1 (x) and Φ2 (x), respectively. Next, we compare each letter in (7) and (8) in turn. If the two letters are identical, then we denote this by the digit 0. Otherwise, we use the digit 1. Then one can generate a sequence consisting of numbers 0 and 1 (the number of 0s and 1s is s + t). For the example above, we obtain the 0–1 sequence as follows: x1i

x2j ,

1 i

2 j

01001 · · · 0. If the null hypothesis H0 is true, then the number of 1s may not be too large. So, we present the following decision rule.

Author's personal copy X. Wang et al. / Mathematical and Computer Modelling 55 (2012) 1478–1482

1481

4.4. Decision rule Assume that F1 (x) and F2 (x) are two theoretical uncertainty distributions and that s, t are defined as in Section 4.3. Let α > 0 be a given level and T be the number of 1s in the 0–1 sequence. We want to test H0 : F1 (x) = F2 (x),

for any x ∈ ℜ

versus H1 : F1 (x) ̸= F2 (x),

for some x ∈ ℜ.

Then our decision rule is as follows. If T ≥ α(s + t ), we should reject H0 ; otherwise, we should accept H0 . 4.5. Definition of the level α In classical statistics, there is no way to argue that a rejection probability α should be 0.1, or 0.18, or 0.05, or anything else. In many situations, researchers define the beginning of reasonable doubt as the value of the test statistics that is equalled or exceeded only 0.05 of the time (when H0 is true). Here, since the number of 1s in the 0–1 sequence may not be too large when the null hypothesis H0 is true, we define the level α as follows:

α=

number of 1s in the 0–1 sequences s+t

,

where s, t are defined as in Section 4.3. Example (Forecasting the Average Scores of Higher Mathematics Examination). In this example, we invited three teachers to analyze the degree of difficulty of a higher mathematics examination in 2010. Each teacher estimated the average scores and his belief degree on the basis of his knowledge and experience. We will test whether their views on the examination are different or not. Let the level α = 0.2. Three teachers’ experimental data is as follows. Teacher 1: (60, 0.05), (70, 0.15), (80, 0.55), (85, 0.85), (90, 0.95). Teacher 2: (60, 0.08), (70, 0.17), (75, 0.36), (80, 0.58), (85, 0.85), (90, 0.95). Teacher 3: (50, 0.2), (60, 0.3), (70, 0.4), (80, 0.8), (85, 1). Let F1 (x), F2 (x) and F3 (x) be the real uncertainty distributions according to Teacher 1, Teacher 2, and Teacher 3. From those data we can generate three empirical uncertainty distributions Φ1 (x), Φ2 (x), and Φ3 (x), accordingly. First, we compare the views of Teacher 1 and Teacher 2. We present the hypotheses as follows. H0 : F1 (x) = F2 (x),

for any x ∈ ℜ;

H1 : F1 (x) ̸= F2 (x),

for some x ∈ ℜ.

Arbitrarily take 8 points from Φ1 (x) and 10 points from Φ2 (x). Teacher 1: (61.5, 0.065), (69.0, 0.14), (70.5, 0.1700), (79.50, 0.53), (81.0, 0.61), (84.75, 0.835), (86.25, 0.875), (88.5, 0.92). Teacher 2: (61.5, 0.0935), (62.25, 0.1003), (63.0, 0.107), (66.0, 0.134) (66.75, 0.1408), (75.75, 0.393), (78.75, 0.5250), (79.5, 0.558), (85.5, 0.86), (87.0, 0.89). Then we obtain the first 0–1 sequence as follows: 000001100000000000. Let T denote the number of 1s in this sequences. Since T = 2 < 3.6 = α(s + t ), we accept the null hypothesis H0 by our decision rule. That is, we believe that the views of Teacher 1 and Teacher 2 are consistent with each other under the level α = 0.2. Second, we compare the views of Teacher 1 and Teacher 3. We present the hypotheses as follows. H0 : F1 (x) = F3 (x),

for any x ∈ ℜ;

H1 : F1 (x) ̸= F3 (x),

for some x ∈ ℜ.

Arbitrarily take 8 points from Φ1 (x) and 10 points from Φ2 (x). Teacher 1: (63.75, 0.0875), (66.0, 0.11), (68.25, 0.1325), (69.75, 0.1475), (73.5, 0.29), (81.0, 0.61), (85.5, 0.86), (87.0, 0.89). Teacher 3: (50.875, 0.2088), (53.5, 0.235), (55.25, 0.2525), (60.5, 0.305), (64.0, 0.34), (67.5, 0.375), (73.625, 0.545), (75.375, 0.615), (77.125, 0.685), (81.5 0.86). Then we obtain the second 0–1 sequence as follows: 111110111110101110. Let T denote the number of 1s in this sequence. Since T = 14 > 3.6 = α(s + t ), we reject the null hypothesis H0 by our decision rule. That is, we believe that the views of Teacher 1 and Teacher 3 are not consistent with each other under the level α = 0.2.

Author's personal copy 1482

X. Wang et al. / Mathematical and Computer Modelling 55 (2012) 1478–1482

5. Conclusions Uncertain statistics is a methodology for collecting and interpreting experts’ experimental data by uncertainty theory. The method of uncertain hypothesis testing proposed in this paper is used to detect if two uncertainty distributions are equal or not. This method depends on each expert’s empirical uncertainty distribution, and the example shows that this method is effective. Acknowledgments This work was supported by the National Natural Science Foundation of China No. 61073121 and Hebei Natural Science Foundation No. F2010001044. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27]

B. Liu, Uncertainty Theory, 2nd ed., Springer-Verlag, Berlin, 2007. B. Liu, Uncertainty Theory: A Branch of Mathematics for Modeling Human Uncertainty, Springer-Verlag, Berlin, 2010. B. Liu, Fuzzy process, hybrid process and uncertain process, Journal of Uncertain Systems 2 (1) (2008) 3–16. B. Liu, Some research problems in uncertainty theory, Journal of Uncertain Systems 3 (1) (2009) 3–10. X. Li, B. Liu, Hybrid logic and uncertain logic, Journal of Uncertain Systems 3 (2) (2009) 83–94. B. Liu, Uncertain set theory and uncertain inference rule with application to uncertain control, Journal of Uncertain Systems 4 (2) (2010) 83–98. B. Liu, Uncertain risk analysis and uncertain reliability analysis, Journal of Uncertain Systems 4 (3) (2010) 163–170. B. Liu, Theory and Practice of Uncertain Programming, 2nd ed., Springer-Verlag, Berlin, 2009. R. Bhattacharyya, et al., Uncertainty theory based novel multi-objective optimization technique using embedding theorem with application to R and D project portfolio selection, Applied Mathematics 1 (2010) 189–199. X. Gao, Some properties of continuous uncertain measure, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 17 (3) (2009) 419–426. X. Gao, Y. Gao, D. Ralescu, On Liu’s inference rule for uncertain systems, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 18 (1) (2010) 1–11. W. Liu, J. Xu, Some properties on expected value operator for uncertain variables, Information: An International Interdisciplinary Journal 13 (5) (2010) 1693–1699. Z. Peng, K. Iwamura, A sufficient and necessary condition of uncertainty distribution, Journal of Interdisciplinary Mathematics 13 (3) (2010) 277–285. Y. Liu, M. Ha, Expected value of function of uncertain variables, Journal of Uncertain Systems 4 (3) (2010) 181–186. X. Wang, Z. Peng, Method of moments for estimating uncertainty distribution, http://orsc.edu.cn/online/100408.pdf. X. Wang, Z. Gao, H. Guo, Delphi method for estimating uncertainty distributions, Information: An International Interdisciplinary Journal (in press). J. Gao, Determine uncertainty distribution via Delphi method, in: Proceeding of the First Conference on Uncertainty Theory, Urumqi, China, 2010, pp. 291–297. R. Guo, Y. Cui, D. Guo, Uncertainty linear regression models, http://orsc.edu.cn/online/101013.pdf. J. Buckley, Fuzzy decision making with data: applications to statistics, Fuzzy Sets and Systems 16 (2) (1985) 139–147. M. Gil, Fuzziness and loss of information in statistical problems, IEEE Transactions on Systems, Man, and Cybernetics 17 (1987) 1012–1025. M. Gil, N. Corral, P. Gil, The fuzzy decision problem: an approach to the point estimation problem with fuzzy information, European Journal of Operational Research 22 (1985) 26–34. N. Corral, M. Gil, A note on interval estimation with fuzzy data, Fuzzy Sets and Systems 28 (2) (1988) 209–215. M. Casals, M. Gil, P. Gil, The fuzzy decision problem: an approach to the problem of testing statistical hypotheses with fuzzy information, European Journal of Operational Research 27 (1986) 371–382. H. Tanaka, S. Uejima, K. Asai, Fuzzy linear regression model, IEEE Transactions on Systems, Man and Cybernetics 10 (1980) 2933–2938. H. Tanaka, S. Uejima, K. Asai, Linear regression analysis with fuzzy model, IEEE Transactions on Systems, Man and Cybernetics 12 (1982) 903–907. K. Piasecki, On the Bayes formula for fuzzy probability measures, Fuzzy Sets and Systems 18 (2) (1986) 183–185. Y. Uemura, A decision rule on fuzzy events, Japanese Journal Fuzzy Theory and Systems 3 (1991) 291–300.