Local parametric sensitivity for mixture models of ... - Semantic Scholar

Report 3 Downloads 126 Views
This paper is intended for the SAMO special issue

Local parametric sensitivity for mixture models of lifetime distributions

M. J. Rufo ∗ , C. J. P´erez, J. Mart´ın Departamento de Matem´ aticas, Universidad de Extremadura, Avda. de la Universidad s/n, 10071 C´ aceres, Spain.

Abstract Mixture models are receiving considerable significance in the last years. Practical situations in reliability and survival analysis may be addressed by using mixture models. When making inferences on them, besides the estimates of the parameters, a sensitivity analysis is necessary. In this paper, a general technique to estimate local prior sensitivities in finite mixtures of distributions from Natural Exponential Families having Quadratic Variance Function (NEF-QVF) is proposed. Those families include some distributions of wide use in reliability theory. An advantage of this method is that it allows a direct implementation of the sensitivity measure estimates and their errors. In addition, the samples that are drawn to estimate the parameters in the mixture model are re-used to estimate the sensitivity measures and their errors. An illustrative application based on insulating fluid failure data is shown. Key words: Bayesian inference, Finite mixture, MCMC, Natural exponential family, Parametric sensitivity

∗ Corresponding author. Departamento de Matem´aticas, Escuela Polit´ecnica, Universidad de Extremadura, Avda. de la Universidad s/n, 10071 C´aceres, Spain. Fax: +34 927257203 Email addresses: [email protected] (M. J. Rufo), [email protected] (C. J. P´erez), [email protected] (J. Mart´ın).

Preprint submitted to Reliability Engineering & System Safety

19 March 2008

1

Introduction

The analysis of finite mixture models is recently playing an important role in theoretical as well as in applied statistics (see, for example, McLachlan and Peel (2000), B¨ohning and Seidel (2003), and B¨ohning et al. (2007)). Mixture models are excellent tools that can be used to describe complex systems over a wide range of applications in many fields of knowledge (see, for example, Lindsay and Lesperance (1995)). Since the advent of Markov chain Monte Carlo (MCMC) methods, many advances have been achieved for the Bayesian analysis of finite mixture models. Inference on the parameters of lifetime probability distributions is necessary in any reliability analysis. These inferences are often hindered by the presence of multiple failure modes and variations in manufacturing that may create a heterogeneous component population. In these cases, standard distributions are inadequate. Mixture models provide additional modeling flexibility and have the physical interpretation of representing failures from a heterogeneous population. Titterington et al. (1985) give a detailed account of references about mixtures for lifetime data. In spite of the modeling advantages, the practical implementation of finite mixture models sometimes introduces additional complexity because they often have many parameters and the calculations become difficult. Rufo et al. (2006) provided a general approach to address a Bayesian analysis of finite mixture models of distributions from Natural Exponential Families with Quadratic Variance Function (NEF-QVF). Note that the families in this class are normal, gamma, hyperbolic-secant, Poisson, binomial and negative-binomial. This general approach solves the prior distribution choice and the unidentifiability problems in this kind of mixtures. However, a sensitivity analysis for the choice of the parameters in the prior distribution is needed. Most sensitivity analyses are informal and simply consist in changing the values in the prior parameters and observing how the output changes. This needs re-running the sampling algorithm for several parameter values with the corresponding computational cost. In this context, formal sensitivity analysis is a difficult task demanded by several authors. P´erez et al. (2006) proposed a computationally low-cost method to estimate local sensitivities in Bayesian models. This method can be applied to complex models that need to be solved by MCMC methods, and it 2

allows to estimate the sensitivity measures and their errors with no additional random sampling. 1

Notation

K(m, µ0 ) normalizing constant

∂λj Iλ0 ∂\ λj Iλ0

l(φ)

likelihood function

d ∂ \ \ SE( λj Iλ0 ) error estimate for ∂λj Iλ0

p(φ|x)

posterior distribution of φ

∂µ0 πλ (µj )

partial derivative w.r.t. µ0

Iλ0

expectation w.r.t. pλ0 (φ|x)

∂m πλ (µj )

partial derivative w.r.t. m

Id λ0

estimate for Iλ0

π(φ)

prior distribution of φ

partial derivative w.r.t. λj estimate for ∂λj Iλ0

In this work, a formal sensitivity analysis for the parameter values of the prior distributions corresponding to the components in finite mixtures of distributions from NEF-QVF is described and applied. The derivatives that are needed to apply the method are obtained for all distributions in this family class. This fact allows a direct implementation, and, therefore, the applicability to mixtures of lifetime distributions. The sensitivities are evaluated and give us information about a proper choice of the prior parameter values. The outline of the paper is as follows. Section 2 presents the basic concepts. Section 3 focuses on local sensitivity estimations and their application to NEFQVF mixtures. Section 4 shows an application in a reliability context. The conclusions are presented in Section 5. Finally, an appendix contains the partial derivatives of the conjugate prior distributions for each family.

2

Background

First, a brief review of some basic concepts concerning exponential families and their conjugate prior distributions is presented.

1

Main symbols used in the paper

3

2.1 Exponential families

Let η be a σ-finite positive measure on the Borel set of R not concentrated at a single point. The parametric family of probability measures {Pθ } is a Natural Exponential Family (NEF) if its density with respect to η satisfies: dPθ (x) = exp {xθ − M (θ)} dη (x) , θ ∈ Θ,

(1)

R

where M (θ) = log exθ dη (x) and Θ = {θ ∈ R : M (θ) < ∞} is assumed to be nonempty. Θ is called natural parameter space and M (θ) cumulant generating function. See, for example, Barndorff-Nielsen (1978), Brown (1986) and Letac (1992) for a review on these families. If X is a random variable distributed according to (1), then ³

0

´

00

E (X|θ) = M (θ) = µ and Var (X|θ) = E (X − µ)2 |θ = M (θ) . 0

The mapping µ = µ (θ) = M (θ) is differentiable, with inverse θ = θ (µ). It provides an alternative parametrization for {Pθ } called mean parametrization. The set Ω = µ (Θ) is called the mean parameter space. The function V (µ) = 00 00 M (θ) = M (θ (µ)) , µ ∈ Ω, is the variance function of {Pθ }. The variance function for NEF-QVF is given by V (µ) = v0 +v1 µ+v2 µ2 , where µ ∈ Ω and v0 , v1 and v2 are real constants. Morris (1982) proved that there exist only six different types of one-parameter NEF-QVF, which are: Poisson, binomial, negative-binomial, gamma, normal and generalized hyperbolicsecant.

2.2 Conjugate prior distributions

Conjugate prior distributions, as in Morris (1983), are considered. Let µ0 ∈ Ω and m > 0. The conjugate prior distribution on θ with respect to the Lesbesgue measure is: π (θ) = K (m, µ0 ) exp {mµ0 θ − mM (θ)} , R

where K (m, µ0 ) is chosen to make Θ π (θ) dθ = 1. The relationships Eπ (µ) = µ0 and Varπ (µ) = V (µ0 ) / (m − v2 ) are satisfied. If the mean parametrization is used, the conjugate prior on µ, π (µ), can be 4

expressed as:

π (µ) = K (m, µ0 ) exp {mµ0 θ (µ) − mM (θ (µ))} V −1 (µ) , 0

where θ = θ (µ) denotes the inverse function of µ = µ (θ) = M (θ). For further information on conjugate prior distributions for exponential families see Consonni and Veronese (1992) and Guti´errez-Pe˜ na and Smith (1997). Note that, by using the mean parametrization, all conjugate prior distributions can be expressed in terms of µ0 and m. This allows to choose conjugate prior distributions with little information based on the data, as in Rufo et al. (2006).

3

Local parametric sensitivity

Sensitivity analysis in MCMC methods is a difficult task demanded by several authors. P´erez et al. (2006) proposed a computationally low-cost method to estimate local sensitivities in Bayesian models. This method can be applied to complex models that need to be solved by MCMC methods, and it allows to estimate the sensitivity measures and their errors with no additional random sampling. This method is developed for mixture models of distributions from NEF-QVF. These families have been considered because: (i) NEF-QVF contain continuous and discrete distributions of wide use in reliability analysis (see, for example, Pal and Sengupta (2000) and references therein), (ii) the framework presented by Rufo et al. (2006) for NEF-QVF mixtures allows an unified Bayesian analysis, and (iii) the partial derivatives necessary to apply the method are easy to calculate. The problem of studying local parametric sensitivity with respect to the prior distributions of the component parameters is addressed. 5

3.1 The method

Suppose the interest is focused on the estimation of the posterior expectation for f (φ), that is: Z Iλ = f (φ) pλ (φ|x) dφ, Φ

where f is a given measurable function, Φ is a multidimensional domain, and λ represents a possibly multidimensional parameter in the space Λ. Local parametric changes in the prior distribution πλ (φ) are considered. Then, Iλ can be expressed as: R

Z

Iλ =

Φ

f (φ) pλ (φ|x) dφ =

Φ

f (φ) l (φ) πλ (φ) dφ R . Φ l (φ) πλ (φ) dφ

Suppose that sampling directly from pλ (φ|x) is complex and, therefore, it is necessary to use MCMC methods. Let φ(1) , φ(2) , . . . , φ(N ) be a sample generated from pλ0 (φ|x), where λ0 is a fixed quantity interior to Λ. Then, an estimate of Iλ0 is given by: N ³ ´ 1 X d Iλ0 = f φ(t) .

N

t=1

The interest is focused on evaluating the impact of changes on Iλ when λ varies in a infinitesimal neighborhood of λ0 , that is, it is necessary to perform a local parametric sensitivity analysis. As a local sensitivity measure, the gradient vector evaluated at λ0 is considered (see Tur´anyi and Rabitz (2000)): ∇Iλ0 = (∂λ1 Iλ0 , ∂λ2 Iλ0 , . . . , ∂λm Iλ0 ) .

(2)

Components in (2), i.e., the partial derivatives with respect to each λj evaluated at λ0 , indicate how rapidly Iλ is changing around an infinitesimal neighborhood of λ0 along that axis. Therefore, the main problem is to calculate the gradient vector. Each component of the gradient vector, ∇Iλ0 , can be expressed (see P´erez et al. (2006)) as: ¯ ¯

h

∂λj Iλ0 = Epλ0 (f (φ) − Iλ0 ) ∂λj log πλ (φ)¯ 6

i

λ=λ0

.

Then, the estimate of ∂λj Iλ0 is given by: N ¯ 1 X (t) ¯ ∂\ (f (φ(t) ) − Id , λj Iλ0 = λ0 )∂λj log πλ (φ )¯ λ=λ0 N t=1

(3)

where φ(1) , φ(2) , . . . , φ(N ) is the sample generated from the posterior distribu1 PN (t) tion pλ0 (φ|x) and Id λ0 = N t=1 f (φ ) is the estimate of Iλ0 . For each j, the estimate ∂\ λj Iλ0 is unbiased, so its error can be measured by its standard error (see Tanner (1996)). Therefore, the Monte Carlo error estimate is given by: ³

d ∂\ SE λj I λ0

´

v u u =t

N ¯ X 1 2 (t) ¯ − ∂\ ((f (φ(t) ) − Id λj Iλ0 ) . λ0 )∂λj log πλ (φ )¯ λ=λ0 N (N − 1) t=1

3.2 Application to mixture models of NEF-QVF distributions

Models in which data x1 , x2 , . . . , xn are assumed to be independent observations from a finite mixture with k components are considered: p (x|ω, θ) =

k X

ωj p (x|θj ) ,

(4)

j=1

where ωj , j = 1, 2, . . . , k, are the mixture weights (which are restricted to be non-negative and sum to unity) and p (x|θj ) are distributions from a NEFQVF, being θj the parameter for component j. The mean parametrization is used since general expressions for the estimates of the components in the gradient vector will be obtained. This allows a direct implementation of the method proposed by P´erez et al. (2006) for all families. By considering this parametrization, the model (4) can be expressed as: p (x|ω, µ) =

k X

ωj p (x|µj ) ,

j=1

where p (x|µj ) = exp {xθj (µj ) − M (θj (µj ))} is a NEF-QVF distribution. The conjugate prior distribution for each µj is: π (µj ) = K (m, µ0 ) exp {mµ0 θj (µj ) − mM (θj (µj ))} V −1 (µj ), 7

(5)

where m > 0, µ0 ∈ Ω and µZ

K (m, µ0 ) =

exp {mµ0 θj (µj ) − mM (θj (µj ))} V

−1

¶−1

(µj )dµj

.

The prior distributions, π (µj ) , j = 1, 2, . . . , k, depend on the parameters m and µ0 . Let λ0 = (m0 , µ00 ) be a fixed quantity. Local sensitivity with respect to the parameter λ = (m, µ0 ) in the neighborhood of λ0 = (m0 , µ00 ) is considered. In order to estimate the components of the gradient vector (2), firstly a sample from the posterior distribution of interest is generated. A useful MCMC method in this context is Gibbs sampling by using latent allocation variables. These random variables identify the components the observations belong to. Each of them follows a multinomial distribution with parameters (1, ω1 , ω2 , . . . , ωk ). An additional problem in mixture models is that the component parameters are not identifiable, since the likelihood function is symmetric in the components, 1, 2, . . . , k. Besides, if real prior information that allows to differentiate among components of the mixture is not available, then the posterior distribution will be invariant under permutations of the parameters. This symmetry in the posterior distribution causes problems when making inferences regarding the individual components, since the information available is the same for each component. This is known as the label-switching problem. A solution to this problem is to impose an identifiability constraint on the parameters (see Richardson and Green (1997)). Alternative approaches to address this problem are the re-parametrization of the component distributions (see Robert (1994)) and the re-labeling of the sample points (see Celeux et al. (2000) and Stephens (2000)). For more information about the label-switching problem and possible solutions see Jasra et al. (2005). In this paper, this problem is solved by using the algorithm proposed in Rufo et al. (2006). This approach is based on permuted sample points and it is an extension to NEF-QVF of the study performed by Stephens (2000). Suppose that φ(1) = (ω (1) , µ(1) ), φ(2) = (ω (2) , µ(2) ), . . . , φ(N ) = (ω (N ) , µ(N ) ) is a sample obtained by using Gibbs sampling. The algorithm in Rufo et al. (2006) seeks permutations of k elements, ν1 , ν2 , . . . , νN , such that the relabeled sample points ν1 (φ(1) ), ν2 (φ(2) ), . . . , νN (φ(N ) ) agree well on the scaled component distributions. Therefore, the main idea is to cluster those sample points giving similar estimates for the scaled components. 8

From now on, in order to simplify notation, φ((t)) = (ω (t) , µ(t) ), t = 1, 2, . . . , N will denote the vectors composed by the permuted samples. Then, the estimates of the partial derivatives given in (3) are: 





N k ¯ X 1 X (t) (t) d \   ∂µ0 Iλ0 = (f (φ ) − Iλ0 ) ∂µ0 log πλ (µj )¯¯ 0  λ=λ N t=1 j=1

(6)

k N ¯ X 1 X (t) (t) d \   ∂m log πλ (µj )¯¯ 0  (f (φ ) − Iλ0 ) ∂ m I λ0 = λ=λ N t=1 j=1

(7)







Next, the partial derivatives ∂µ0 πλ (µj ) and ∂m πλ (µj ) are calculated. By considering expression (5), it is obtained: ∂µ0 πλ (µj ) = (∂µ0 K (m, µ0 )) exp {mµ0 θj (µj ) − mM (θj (µj ))} V −1 (µj )+ +K (m, µ0 ) exp {mµ0 θj (µj ) − mM (θj (µj ))} V −1 (µj ) (mθj (µj )) , ∂m πλ (µj ) = (∂m K (m, µ0 )) exp {mµ0 θj (µj ) − mM (θj (µj ))} V −1 (µj )+ +K (m, µ0 ) exp {mµ0 θj (µj ) − mM (θj (µj ))} V −1 (µj ) (µ0 θj (µj ) − M (θj (µj ))) . Therefore, ∂µ0 K (m, µ0 ) + mθj (µj ) , K (m, µ0 ) ∂m K (m, µ0 ) ∂m log πλ (µj ) = + µ0 θj (µj ) − M (θj (µj )) K (m, µ0 )

∂µ0 log πλ (µj ) =

and, therefore, the expressions (6) and (7) become:

∂\ µ 0 I λ0 =

N 1 X

N

1 ∂\ m I λ0 = N





(f (φ

(t)

 ) − Id λ0 )

t=1 N h X

k X ∂µ0 K (m, µ0 ) j=1

K (m, µ0 )

×

  λ=λ0

(f (φ(t) ) − Id λ0 )×

t=1



¯ ¯ (t) ¯¯ + mθj (µj )¯ ¯

k X ∂µ0 K (m, µ0 ) j=1

K (m, µ0 )

¯ ¯

+

(t) µ0 θj (µj )





(t) ¯ M (θj (µj ))¯¯

¯

 . λ=λ0

Thus, for NEF-QVF mixtures and by using the mean parametrization, it is necessary to know the expressions of θj (µj ) and M (θj (µj )) for each family, and to calculate the partial derivatives of the functions K (m, µ0 ). A detailed 9

study, showing the partial derivatives with respect to the parameters of the conjugate prior distributions for each case, is presented in the appendix.

4

Application to reliability

In this section, the application of the proposed method is illustrated by using a real data set. The data set presented in Table 1 shows the times (in minutes) to break down of an insulating fluid under two level of voltage stress and it is composed by 34 observations. The data belong to a study performed by Ghosh and Ebrahimi (2001).

0.19

0.59

0.18

0.35

1.47

0.38

0.99

0.52

0.18

1.65

0.85

0.66

0.26

4.33

19.15

0.77

1.39

2.8

36.18

0.35

0.24

0.37

0.03

0.7

0.28 0.1 0.52 0.12 0.19 0.77 0.1 1.58 8.42 11.73 Table 1 Breakdown failure times of insulating fluid under two levels of voltage stress.

The observations come from two populations with different voltage stress, but it is not known which observation is coming from which population. Then, a mixture of two exponential distributions is fitted to the data. By using the notation presented in this paper, it is assumed that the failure times are coming from a density of the form: p (x|ω, µ) = ω1 ·

exp (−x/µ1 ) exp (−x/µ2 ) + ω2 · , µ1 µ2

where ω1 + ω2 = 1, and µ1 , µ2 > 0. A noninformative prior distribution is assumed for the weight vector, i.e, ω = (ω1 , ω2 ) ∼ Dirichlet(δ, δ), with δ = 1. In this context, this is the usual choice when no prior information is available for the weights (see, for example, Richardson and Green (1997)). On the other hand, a conjugate prior distribution is chosen for µj , j = 1, 2, i.e., µj ∼ Inverted Gamma(m + 1, mµ0 ), with µ0 = 1 and m = 0.05. By using these parameter values, relatively noninformative prior distributions are obtained for µ1 and µ2 . Figure 1 shows the probability density function. 10

3.0 2.5 2.0 1.5 0.0

0.5

1.0

Density

0

10

20

30

40

mj

Fig. 1. Prior distribution for µj , j = 1, 2.

First, the interest is focused on studying how the posterior means of the parameters µj vary, when changes on the parameters µ0 and m are applied in a neighborhood of µ0 = 1 and m = 0.05, i.e., when a local sensitivity analysis on these posterior means is performed. Note that those quantities represent the means of the failure rates after the Bayes’ update has been done. Observe that, in a Bayesian analysis, a formal sensitivity analysis is required to study if the posterior means of the failure rates could be sensitive to the prior inputs. By using these specifications, a random sample of size 150,000 for the posterior distributions of the weight and mean vectors is generated by using the Gibbs sampling algorithm. Concretely: Algorithm 1 By using the vector generated at iteration t, (z (t) , ω (t) , µ(t) ), generate (z (t+1) , ω (t+1) , µ(t+1) ) in the following steps: (t+1)

1. Generate Zi

(t+1)

∝ ωj

·

exp(−xi /µj ) µj (t+1)

2. Generate ω (t+1) ∼ Dirichlet (n1 (t+1)

3. Generate µj j = 1, 2,

for i = 1, 2, . . . , 34, j = 1, 2 (t+1)

+ 1, n2

+ 1)

(t+1)

∼ Inverted Gamma (m + nj

(t+1) (t+1) xj

+ 1, nj

+ mµ0 ), for

where z is the latent allocation vector, nj , j = 1, 2, is the number of observaP tions allocated to component j and xj = {i:Zi =j} xi /nj . 11

This sample has been obtained after the convergence has been considered to be achieved. The convergence diagnostics has been assessed by using CODA (see Best et al. (1997)). Since no constraint has been imposed on the parameters, the results could be influenced by the parameter unidentifiability problem. A post-processing study on the generated samples has been performed by using the following algorithm: Algorithm 2 Starting with some values for ν1 , ν2 , . . . , νN (e.g., the identity permutation) iterate the following steps until a fixed point is reached: N P

b = (ω, b µ), b where ω bj = 1. Take φ

t=1

(t) t (j)

ων N

N P

and µb j =

t=1

(t) (t) µ t (j) νt (j)

ων

N P

t=1

(t) t (j)

, j = 1, 2.

ων

2. For t = 1, 2, . . . , N choose νt to minimize: 2 P

(t)

(t)

(t)

b j − (1 − ων (j) ) log(1 − ω bj ) + [ωνt (j) log(µb j ) − ωνt (j) log ω t

(t) (t) µ t (j) νt (j)

ων

j=1

].

µ bj

In this case, the permuted samples have remained invariant. The reason is that two well-differentiated populations have been detected (see Figure 2). For more details on the implementation of the algorithm to solve the labelswitching problem, see Rufo et al. (2006). The data and the fitted density mixture are presented in Figure 3. The posterior expectation for µj is represented by Eµ0 ,m (µj ) = E [µj |x, µ0 , m] for j = 1, 2. Table 2 shows the estimations of the posterior means Eµ\ 0 ,m (µj ) \ and the partial derivatives ∂µ0 E\ µ0 ,m (µj ) and ∂m Eµ0 ,m (µj ) for j = 1, 2. The standard errors of the partial derivatives are also presented. j

Eµ\ 0 ,m (µj )

∂µ0 E\ µ0 ,m (µj )

∂m E\ µ0 ,m (µj )

se( b ∂µ0 E\ µ0 ,m (µj ))

se( b ∂m E\ µ0 ,m (µj ))

1

0.581545

0.003007

0.014017

0.000736

0.012172

2 12.107760 0.037110 -2.011459 0.032049 Table 2 Estimated posterior means, partial derivatives and standard errors.

0.520248

The partial derivatives indicate how rapidly the failure posterior mean is changing around an infinitesimal neighborhood of the parameter values of the prior distribution. For the first component, the partial derivatives with respect to µ0 and m are 0.003007 and 0.014017, and can be considered small enough 12

4 0

1

2

3

4 3 2 1 0 0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

0.0

0.00

1.0

0.04

2.0

0.08

3.0

w2

0.5

1.0

1.5

0

10

20

30

40

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Fig. 2. Histograms for the generated samples.

Density

0.0

0

10

20

30

x

Fig. 3. Fitted density mixture.

13

40

50

60

relative to the value of the posterior mean, i.e, Eµ\ 0 ,m (µ1 ) = 0.581545. Therefore, the rate of change in both directions can be considered small, leading to a locally robust estimation. With respect to the second component, note the increase in magnitude of the posterior mean estimation, Eµ\ 0 ,m (µ2 ) = 12.107760. The partial derivative for µ0 is very small relative to the posterior mean. The partial derivative with respect to m is -2.014597 and represents less than 17% of the posterior mean value. Although it represents a larger rate of change than the one corresponding to µ0 , it can be assumed. Therefore, the estimation of this posterior mean can also be considered as locally robust for the parameter values of the prior distribution. The gradient vector represents the precise direction which has maximum increase of the posterior means at the parameter values of the prior distribution. Note that the components of the gradient vectors evaluated at (µ0 , m) = (1, 0.05) verify: \ |∂µ0 E\ µ0 ,m (µj )| < |∂m Eµ0 ,m (µj )| for j = 1, 2. Assuming that the parameters µ0 and m have the same uncertainty, the previous relationships indicate that the parameter m has more influence on the output than the parameter µ for both components. Therefore, the parameter choice for the prior distribution has been made suitable to provide locally robust estimations of the posterior means for each component. Then, the posterior means of the failure times of each component can be used in any procedure with the warranty of being locally robust for the chosen parameters. Similarly, local sensitivity measures for some posterior reliability probabilities are obtained. Now, the quantities of interest are: Z

Eµ0 ,m (fy (φ)) =

fy (φ)π(φ|x)dφ

for different values of y with: fy (φ) = fy (ω, µ) = ω1 e−y/µ1 + ω2 e−y/µ2 \ Table 3 shows the results for some values of y, where Eµ0 ,m (fy (φ)) denotes the estimates of the posterior reliability probabilities, ∂µ0 Eµ\ 0 ,m (fy (φ)) and 14

∂m Eµ0\ ,m (fy (φ)) denote de estimates of the partial derivatives with respect to d and R d are the ratios of the partial derivatives µ0 and m, respectively, and R µ0 m with respect to µ0 and m to the estimated probabilities. y

\ Eµ0 ,m (fy (φ))

∂µ0 Eµ\ 0 ,m (fy (φ))

∂m Eµ0\ ,m (fy (φ))

d R µ0

d R m

1

0.344027

0.000748

0.006247

0.002174

0.001816

2

0.214534

0.000037

0.000807

0.000172

0.003765

5

0.138877

-0.000171

-0.005718

-0.001229

-0.041174

10

0.087143

-0.000001

-0.008832

-0.000013

-0.101348

20

0.037938

0.000083

-0.008063

0.002195

-0.212554

30 0.018271 0.000072 -0.005735 0.003983 -0.313896 Table 3 Estimation for posterior reliability probabilities and partial derivatives.

Robustness should be analyzed having into account the location of the breakdown failure times. It can be observed how robustness with respect to changes in m decreases for high values of failure times, leading to sensible estimations. If the interest is focused in high values of failure times, it is necessary to make sure that the value of m is right. However, with respect to µ0 , the model is robust for all y.

5

Conclusions

The aim of this paper is to develop a general framework that allows to study local prior sensitivities in finite mixtures of distributions from NEF-QVF. By considering infinitesimal perturbations in the parameters of the prior distributions corresponding to each component, the general method proposed by P´erez et al. (2006) is applied. The partial derivatives that are needed to estimate the components in the gradient vector are calculated and general expressions are obtained. The main advantage of the proposed method that has been applied is that it allows a direct implementation of the sensitivity measure estimates and their errors. The method has a low-computational cost, since the same sample generated to estimate the parameters of the mixture model is used to estimate the sensitivity measures and their errors. Finally, it is remarkable 15

that the method is easy to implement and can be easily applied in practice to study local parametric sensitivity in reliability contexts.

Acknowledgements

We thank the reviewers for comments and suggestions which have highly improved not only the readability of the paper but also its content. This research has been partially supported by Ministerio de Educaci´ on y Ciencia, Spain (Project TSI2004-06801-C04-03).

Appendix. Theoretical results

Poisson Family: P (µ), µ > 0. The conjugate prior distribution for µj , j = 1, 2, . . . , k, is a gamma distribution, G (mµ0 , m). Therefore: K (m, µ0 ) =

mmµ0 , θ (µj ) = log µj and M (θj (µj )) = µj . Γ (mµ0 )

In order to obtain the components of the vector (2), the partial derivatives have been calculated. Then, the following expressions have been obtained: ∂µ0 log πλ (µj ) = m(log (m) − Ψ (mµ0 ) + log µj ), ∂m log πλ (µj ) = µ0 (log (m) + 1 − Ψ (mµ0 ) + log (µj )) − µj , 0

where Ψ (·) = Γ (·) /Γ (·) represents the digamma function. Binomial family: B (r, p) , r = 0, 1, . . . , 0 < p < 1. The conjugate prior distribution for µj = rpj , j = 1, 2, . . . , k, is a scaled beta distribution, i.e., a generalized beta distribution, GBe (mµ0 , mr − mµ0 ) on (0, r). Therefore: Ã

Γ (mr) µj K (m, µ0 ) = , θ (µj ) = log Γ (mµ0 ) Γ (mr − mµ0 ) r − µj ! Ã r . M (θj (µj )) = r log r − µj The partial derivatives are: 16

!

and

Ã

Ã

!!

µj ∂µ0 log πλ (µj ) = m −Ψ (mµ0 ) − Ψ (mr − mµ0 ) + log , r − µj à à !! r ∂m log πλ (µj ) = r Ψ (mr) − Ψ (mr − mµ0 ) − log + r − µj à à !! µj +µ0 Ψ (mr − mµ0 ) − Ψ (mµ0 ) + log . r − µj Negative-binomial family: BN (r, p), r > 0, < p < 1. The conjugate prior distribution for µj = rpj /(1 − pj ), j = 1, 2, . . . , k, is a scaled beta distribution of the second type, SBe (mµ0 , mr + 1) (see Johnson et al. (1995)). Therefore: Ã

Γ (mµ0 + mr + 1) µj K (m, µ0 ) = , θj (µj ) = log Γ (mµ0 ) Γ (mr + 1) r + µj à ! r M (θj (µj )) = r log . r + µj

!

and

The partial derivatives are: Ã

!

µj ∂µ0 log πλ (µj ) = m(Ψ (mµ0 + mr + 1) − Ψ (mµ0 ) + log ), r + µj à à !! µj ∂m log πλ (µj ) = µ0 Ψ (mµ0 + mr + 1) − Ψ (mµ0 ) + log + r + µj à à !! r +r Ψ (mµ0 + mr + 1) − Ψ(mr + 1) + log . r + µj Gamma family: G (r, λ), r, λ > 0. The conjugate prior distribution for µj = r/λj , j = 1, 2, . . . , k, is an inverse gamma distribution, IG (mr + 1, mrµ0 ). Therefore: Ã

(mµ0 )mr+1 r 1 K (m, µ0 ) = , θj (µj ) = − and M (θj (µj )) = −r log Γ (mr + 1) µj µj

!

.

The partial derivatives are: Ã

1 1 ∂µ0 log πλ (µj ) = mr − µ0 µj Ã

!

+

1 , µ0

Ã

µ0 r + log ∂m log πλ (µj ) = r log (mµ0 ) + 1 − Ψ (mr + 1) − µj µj 17

!!

+

1 . m

Normal family: N (µ, σ 2 ), −∞ < µ < ∞, σ 2 > 0. The conjugate prior distri³ ´ σ2 bution for µj , j = 1, 2, . . . , k, is a normal distribution, N µ0 , m . Therefore: r

K (m, µ0 ) =

(

)

µ2j m mµ20 µj exp − , θ (µ ) = and M (θ (µ )) = . j j j j σ 2 2π 2σ 2 σ2 2σ 2

The partial derivatives are: m (µ − µ0 ) , σ 2µ ¶ 1 1 1 2 ∂m log πλ (µ) = − (µ − µ0 ) . 2 m σ2

∂µ0 log πλ (µ) =

Hiperbolic Secant family: GHS (r, λ), r, λ > 0. The general expression of the conjugate prior distribution, when r = 1, is: n



πλ (µj ) = K (m, µ0 ) exp mµ0 tan−1 (µj ) where

θj (µj ) = tan−1 (µj ) and M (θj (µj )) =

1 + µ2j

´−( m +1), 2

³ ´ 1 log 1 + µ2j . 2

This is a scaled Student’s density with m + 1 degrees of freedom when µ0 = 0, and the Cauchy distribution when m −→ 0 (see Morris (1983)). The prior density, when µ0 = 0, is: ³

πλ (µj ) = K (m, 0) 1 + µ2j

´−( m +1)

Therefore: K (m, 0) =

2

= K (m, 0) ³

³

1 1+

µ2j

´ (m+2) . 2

´

m+2 2 ´ ³ ´. ³ m+1 Γ 2 Γ 12

Γ

Since µ0 = 0, then only local sensitivity analysis with respect to m is performed. The partial derivative is: µ

µ



µ

1 m+2 m+1 ∂m log πλ (µj ) = Ψ −Ψ 2 2 2

¶¶

³ ´ 1 +µ0 tan−1 (µj )− log 1 + µ2j . 2

When m −→ 0, the conjugate prior distribution is: πλ (µj ) = K0 ³

1 1 + µ2j

´.

Then, K0 = 1/π and the method is not applicable. 18

References Barndorff-Nielsen, O., 1978. Information and Exponential Families in Statistical Theory. Wiley. Best, N. G., Cowles, M. K., Vines, S. K., 1997. Coda: Convergence diagnosis and output analysis software for Gibbs sampling output. Tech. rep., MRC Biostatistics Unit, Cambridge. B¨ohning, D., Seidel, W., 2003. Editorial: recent developments in mixture models. Computational Statistics and Data Analysis 41, 349–357. B¨ohning, D., Seidel, W., Alf´o, M., Garel, B., Patilea, V., Walter, G., 2007. Advances in mixture models. Computational Statistics and Data Analysis 51 (11), 5205–5210. Brown, L. D., 1986. Fundamental of Statistical Exponential Families with Applications in Statistical Decision Theory. Lecture Notes, 9. Hayward: Institute of Mathematical Statistics. Celeux, G., Hurn, M., Robert, C. P., 2000. Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association 95, 957–970. Consonni, G., Veronese, P., 1992. Conjugate priors for exponential families having quadratic variance functions. Journal of the American Statistical Association 87 (420), 1123–1127. Ghosh, S. K., Ebrahimi, N., 2001. Bayesian analysis of the mixing function in a mixture of two exponential distributions. Tech. Rep. 2531, Institute of Statistics Mimeographs, North Carolina State University. Guti´errez-Pe˜ na, E., Smith, A. F. M., 1997. Exponential and Bayesian conjugate families: Review and extensions (with discussion). Test 6, 1–90. Jasra, A., Holmes, C. C., Stephens, D. A., 2005. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modelling. Statistical Science 20, 50–67. Johnson, N. L., Kotz, S., Balakrishnan, N., 1995. Continuous Univariate Distributions, Volume 2. Wiley. Letac, G., 1992. Lectures on Natural Exponential Families and Their Variance Funtions. ”Monografias de Matem´atica, 50. Instituto de Matem´atica Pura e Aplicada, Rio de Janeiro. Lindsay, B. G., Lesperance, M. L., 1995. A review of semiparametric mixture models. Journal of Statistical Planing and Inference 47, 29–39. McLachlan, G., Peel, D., 2000. Finite Mixture Models. John Wiley and Sons.

19

Morris, C. N., 1982. Natural exponential families with quadratic variance functions. The Annals of Statistics 10, 65–80. Morris, C. N., 1983. Natural exponential families with quadratic variance functions: Statistical theory. Annals of Statistics 11, 515–529. Pal, C., Sengupta, A., 2000. Optimal tests for no contamination in reliability models. Lifetime Data Analysis 6, 281–290. P´erez, C. J., Mart´ın, J., Rufo, M. J., 2006. Sensitivity estimations for bayesian inference models solved by MCMC methods. Reliability Engineering and System Safety 91, 1310–1314. Richardson, S., Green, P. J., 1997. On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society 59 (4), 731–792. Robert, C. P., 1994. The Bayesian Choice. Springer-Verlag. Rufo, M. J., Mart´ın, J., P´erez, C. J., 2006. Bayesian analysis of finite mixture models of distributions from exponential families. Computational Statistics 21, 621–637. Stephens, M., 2000. Dealing with label-switching in mixture models. Journal of the Royal Statistical Society, series B 62, 795–809. Tanner, M. A., 1996. Tools for Statistical Inference. Springer-Verlag. Titterington, D. M., Smith, A. F. M., Makov, U. E., 1985. Statistical analysis of finite mixture distributions. Wiley. Tur´anyi, T., Rabitz, H., 2000. Local Methods. John Wiley and Sons, Ch. Sensitivity Analysis (A. Saltelli, K. Chan, and M. Scott), pp. 81–89.

20