Statistical methods for non-precise data - Semantic Scholar

Report 10 Downloads 30 Views

Institut f. Statistik u. Wahrscheinlichkeitstheorie 1040 Wien, Wiedner Hauptstr. 8-10/107 AUSTRIA http://www.statistik.tuwien.ac.at

Statistical methods for non-precise data R. Viertl

Forschungsbericht SM-2009-6 Dezember 2009

Kontakt: [email protected]

STATISTICAL METHODS FOR NON-PRECISE DATA Reinhard Viertl Department of Statistics and Probability Theory Vienna University of Technology 1040 Wien, Austria

1

Non-Precise Data

Real data obtained from measurement processes are not precise numbers or vectors, but more or less non-precise, also called fuzzy. This uncertainty is different from measurement errors and has to be described formally in order to obtain realistic results from data analysis. A real life example is the water level of a river at a fixed time. It is typically not a precise multiple of the scale unit for height measurements. In the past this kind of uncertainty was mostly neglected in describing such data. The reason for that is the idea of the existence of a “true” water level which is identified with a real number times the measurement unit. But this is not realistic. The formal description of such non-precise water levels can be given using the intensity of the wetness of the gauge to obtain the so called characterizing functions from the next section. Further examples of non-precise data are readings on digital measurement equipments, readings of pointers on scales, color intensity pictures, and light points on screens. Remark 1: Non-precise data are different from measurement errors because in error models the observed values 𝑦𝑖 are considered to be numbers, i.e. 𝑦𝑖 = 𝑥𝑖 + 𝜖𝑖 , where 𝜖𝑖 denotes the error of the 𝑖-th observation. Historically non-precise data were not studied sufficiently. Some earlier work was done in interval arithmetics. General non-precise data in form of so called fuzzy numbers were considered in the 1980ies and first publications combining fuzzy imprecision and stochastic uncertainty came up, see Kacprzyk and Fedrizzi [0]. Some of these approaches are more theoretically oriented. An applicable approach for statistical analysis of non-precise data is given in Viertl [0].

Characterizing functions of non-precise data In case of measurements of one-dimensional quantities non-precise observations can be reasonably described by so-called fuzzy numbers 𝑥★ . Fuzzy numbers are generalizations of real numbers in the following sense. Each real number 𝑥 ∈ 𝐼𝑅 is characterized by its indicator function 𝐼{𝑥} (⋅). A fuzzy number is characterized by its so-called characterizing function 𝜉(⋅) which is a generalization of an indicator function. A characterizing function is a real function of a real variable obeying the following: 1. 𝜉 : 𝐼𝑅 −→ [0, 1] 2. ∀ 𝛿 ∈ (0, 1] the so called 𝛿-cut 𝐶𝛿 (𝑥★ ) := {𝑥 ∈ 𝐼𝑅 : 𝜉(𝑥) ≥ 𝛿} is a non-empty and closed bounded interval Remark 2: A characterizing function is describing the imprecision of one observation. It should not be confused with a probability density which is describing the 1

stochastic variation of a random quantity 𝑋. A fundamental problem is how to obtain the characterizing function of a non-precise observation. This depends on the area of application. Some examples can be given. Example 1: For data in form of gray intensities in one dimension as boundaries of regions the gray intensity 𝑔(𝑥) as an increasing function of one real variable 𝑥 can be used to obtain the characterizing function 𝜉(⋅) in the following way. Take the 𝑑 derivative 𝑑𝑥 𝑔(𝑥) and divide it by its maximum then the resulting function or its convex hull can be used as characterizing function of the non-precise observation.

2

Non-precise samples

Taking observations of a one-dimensional continuous quantity 𝑋 in order to estimate the distribution of 𝑋 usually a finite sequence 𝑥★1 , ⋅ ⋅ ⋅ , 𝑥★𝑛 of non-precise numbers is obtained. These non-precise data are given in form of 𝑛 characterizing functions 𝜉1 (⋅), ⋅ ⋅ ⋅ , 𝜉𝑛 (⋅) corresponding to 𝑥★1 , ⋅ ⋅ ⋅ , 𝑥★𝑛 . Facing this kind of samples even the most simple concepts like histograms have to be modified. This is necessary by the fact that for a given class 𝐾𝑗 of a histogram in case of a non-precise observation 𝑥★𝑖 with characterizing function 𝜉𝑖 (⋅) obeying 𝜉𝑖 (𝑥) > 0 for an element 𝑥 ∈ 𝐾𝑗 and 𝜉𝑖 (𝑦) > 0 for an element 𝑦 ∈ 𝐾𝑗𝑐 it is not possible to decide if 𝑥★𝑖 is an element of 𝐾𝑗 or not. A generalization of the concept of histograms is possible by so-called fuzzy histograms. For those histograms the height of the histogram over a fixed class 𝐾𝑗 is a fuzzy number ℎ★𝑗 . For the definition of the characterizing function of ℎ★𝑗 compare Viertl [0]. For other concepts of statistics in case of non-precise data compare Viertl [0].

Fuzzy vectors In case of multivariate continuous data 𝒙 = (𝑥1 , ⋅ ⋅ ⋅ , 𝑥𝑛 ), for example the position of an object on a radar screen, the observations are non-precise vectors 𝒙★ . Such non-precise vectors are characterized by so called vector-characterizing functions 𝜁𝒙★ (⋅, ⋅ ⋅ ⋅ , ⋅). These vector-characterizing functions are real functions of 𝑛 real variables 𝑥1 , ⋅ ⋅ ⋅ , 𝑥𝑛 obeying the following: (1) 𝜁𝒙★ : 𝐼𝑅𝑛 −→ [0, 1] (2) ∀ 𝛿 ∈ (0, 1] the 𝛿-cut 𝐶𝛿 (𝒙★ ) := {𝒙 ∈ 𝐼𝑅𝑛 : 𝜁𝒙★ (𝒙) ≥ 𝛿} is a non-empty, closed and star shaped subset of 𝐼𝑅𝑛 with finite 𝑛-dimensional content In order to generalize statistics 𝑡(𝑥1 , ⋅ ⋅ ⋅ , 𝑥𝑛 ) to the situation of fuzzy data the fuzzy sample has to be combinded into a fuzzy vector called fuzzy combined sample.

3

Generalized Classical Inference

Based on combined fuzzy samples point estimators for parameters can be generalized using the so-called extension principle from fuzzy set theory. If 𝜗(𝑥1 , ⋅ ⋅ ⋅ , 𝑥𝑛 ) is a 2

classical point estimator for 𝜃, then 𝜗(𝑥★1 , ⋅ ⋅ ⋅ , 𝑥★𝑛 ) = 𝜗(𝒙★ ) yields a fuzzy element 𝜃ˆ★ of the parameter space Θ. Generalized confidence regions for 𝜃 can be constructed in the following way. Let 𝜅(𝑥1 , ⋅ ⋅ ⋅ , 𝑥𝑛 ) be a classical confidence function for 𝜃 with coverage probability 1−𝛼, i.e. Θ1−𝛼 is the corresponding confidence set. For fuzzy data 𝑥★1 , ⋅ ⋅ ⋅ , 𝑥★𝑛 a generalized confidence set Θ★1−𝛼 is defined as the fuzzy subset of Θ whose membership function 𝜑(⋅) is given by its values } { sup {𝜁(𝒙) : 𝒙 ∈ 𝑀𝑋𝑛 , 𝜃 ∈ 𝜅(𝒙)} if ∃ 𝒙 : 𝜃 ∈ 𝜅(𝒙) ∀ 𝜃 ∈ Θ. 𝜑(𝜃) = 0 if ∃/ 𝒙 : 𝜃 ∈ 𝜅(𝒙) Statistical tests are mostly based on so-called test statistics 𝑡(𝑥1 , ⋅ ⋅ ⋅ , 𝑥𝑛 ). For nonprecise data the values 𝑡(𝑥★1 , ⋅ ⋅ ⋅ , 𝑥★𝑛 ) become non-precise numbers. Therefore test decisions are not as simple as in the classical (frequently artificial) situation. There are different generalizations possible. Also in case of non-precise values of the test statistic it is possible to find 𝑝-values and the test decision is possible similar to the classical case. Another possibility is to define fuzzy 𝑝-values which seems to be more problem adequate. For details see Viertl [0]. There are other approaches for the generalization of classical inference procedures to the situation of fuzzy data. References for that are Gil et al. [0] and Näther [0].

4

Generalized Bayesian Inference

In Bayesian inference for non-precise data, besides the imprecision of data there is also imprecision of the a-priori distribution. So Bayes’ theorem is generalized in order to take care of this. The result of this generalized Bayes’ theorem is a so-called fuzzy a-posteriori distribution 𝜋 ★ (⋅ ∣ 𝑥★1 , ⋅ ⋅ ⋅ , 𝑥★𝑛 ) which is given by its so-called 𝛿-level functions 𝜋 𝛿 (⋅ ∣ 𝒙★ ) and 𝜋 𝛿 (⋅ ∣ 𝒙★ ) respectively. From the fuzzy a-posteriori distributions generalized Bayesian confidence regions, highest a-posteriori density regions, and fuzzy predictive distributions can be constructed. Moreover also decision analysis can be generalized to the situation of fuzzy utilities and non-precise data.

5

Applications

Whenever measurements of continuous quantities have to be modeled non-precise data appear.This is the case with initial conditions for differential equations, time dependent description of quantities, as well as in statistical analysis of environmental data.

3

Bibliography [1] H. Bandemer: Modelling Uncertain Data, Akademie Verlag, Berlin, 1993 [2] H. Bandemer: Mathematics of Uncertainty, Springer-Verlag, Berlin, 2006 [3] D. Dubois, M. Lubiano, H. Prade, M. Gil, P. Grzegorzewski, O. Hryniewicz (Eds.): Soft Methods for Handling Variability and Imprecision, Springer-Verlag, Berlin, 2008 [4] M. Gil, N. Corral, P. Gil: The minimum inaccuracy estimates in 𝜒2 -tests for goodness of fit with fuzzy observations, Journal of Statistical Planning and Inference 19 (1988) [5] J. Kacprzyk, M. Fedrizzi (Eds.): Combining Fuzzy Imprecision with Probabilistic Uncertainty in Decision Making, Lecture Notes in Economics and Mathematical Systems, Vol. 310, Springer-Verlag, Berlin, 1988 [6] W. Näther: Linear statistical inference for random fuzzy data, Statistics, 29, No. 3 (1997) [7] T. Ross, J. Booker, W. Parkinson (Eds.): Fuzzy Logic and Probability Applications – Bridging the Gap, SIAM, Philadephia, 2002 [8] R. Viertl: Statistical Methods for Non-Precise Data, CRC-Press, Boca Raton, Florida, 1996 [9] R. Viertl: Univariate statistical analysis with fuzzy data, Computational Statistics & Data Analysis 51 (2006) [10] R. Viertl: Foundations of Fuzzy Bayesian Inference, Journal of Uncertain Systems, Vol. 2, No. 3 (2008) [11] R. Viertl, D. Hareter: Beschreibung und Analyse unscharfer Information – Statistische Methoden für unscharfe Daten, Springer-Verlag, Wien, 2006

4

Recommend Documents