Bootstrap based tests for generalized negative ... - Semantic Scholar

Report 1 Downloads 57 Views
Computing 61, 359-369 (1998)

~

1

~

© Spfinger-Vedag 1998 Printed in Austria

Bootstrap Based Tests for Generalized Negative Binomial Distribution F. Famoye ~, Mt. Pleasant Received April 25, 1998

Abstract

Goodness of fit test statistics based on the empirical distribution function (EDF) are considered for the generalized negative binomial distribution. The small sample levels of the tests are found to be very close to the nominal significance levels. For small sample sizes, the tests are compared with respect to their simulated power of detecting some alternative hypotheses against a null hypothesis of generalized negative binomial distribution. The discrete Anderson--Darling test is the most powerful among the EDF tests. Two numerical examples are used to illustrate the application of the goodness of fit tests.

AMS Subject Classifications: 62G09, 62G10, 62E25. Key words: Goodness of fit, Monte Carlo simulation, empirical distribution function, power.

Introduction A discrete random variable X is said to have a generalized negative binomial distribution (GNBD) if its probability mass function is given by

( m [m+ 0)m+~x-x, P(x;O, fl,m)=~ m+flx\ x/3X)Ox(1-

x = 0,1,2, ....

(1.1)

0, otherwise, where 0 < 0 < 1 , m > 0 , and / 3 = 0 or 1 < / 3 < 0 - 1 . When / 3 = 0 and m is an integer, the probability function in (1.1) reduces to the binomial distribution and when /3 = 1, the G N B D reduces to the negative binomial distribution. The probability model in (1.1) was defined and studied by Jain and Consul [14]. All the moments of G N B D exist for /3 = 0 and /3 _> 1. Consul and Famoye [3] showed that the model is unimodal for all values of 0, /3, and m. Under mild 1 The support received from the Research Professorship Program at Central Michigan University under the grant #22159 is gratefully acknowledged.

360

F. Famoye

conditions of successive differentiability of functions f(t) and and Watson ([17], p. 133) gave the Lagrange's expansion'as

f ( u ) = f ( O ) + ~_. -~. x=l

where t and u are related by t = (pgf) of the GNBD is given by

-~]

[(g(t))xf'(t)]

/

g(t), Whittaker ,

(1.2)

t~O

ug(t). The probability generating function

f(u) = ( 1 - 0)m(1 -- Ot) -m, (1.3) where t =ug(t)= u ( 1 - 0 ) ~ - 1 ( 1 - Ot)~-t~. Another form of the GNBD pgf is given by

f(u) = (1 - 0 + Ot) m, (1.4) where t=ug(t)=u(1- O+ Ot) t~. By using the Lagrange's expansion in (1.2), both forms of pgf lead to the GNBD model in (1.1). We remark here that the two forms of pgf leading to the GNBD model are different. Suppose Xl, X 2 ..... X n is a random sample and the question of interest is to test if the random sample is from the GNBD. In this case, the parameters 0, /3, and m are unknown and they have to be estimated from the sample data in order to test whether the sample data is from GNBD model. This goodness of fit problem leads to the following hypotheses.

H 0:X 1..... X n constitute a random sample from GNBD

Ha:H o is not true

(1.5)

This type of problem for many other probability distributions has been considered by many researchers. See the book by D'Agostino and Stephens [8] and the references therein. When a random sample is drawn from a continuous distribution, several empirical distribution function (EDF) test statistics have been used. Stephens [15] examined goodness of fit tests based on the EDF statistics. Among the EDF test statistics examined for some continuous distributions are the Kolmogorov-Smirnov (D), the Cramer-von Mises (W2), the Anderson-Darling (A2), and the Watson (U2). Through simulation, Stephens showed that the EDF test statistics are more powerful than the classical Pearson chi-square statistic. Stephens found that the A 2 and W 2 are the best pair of EDF test statistics. The Pearson chi-square statistic is the most widely used goodness of fit test for discrete distributions. The Kolmogorov-Smirnov test was originally developed for continuous data. Conover [2] developed the Kolmogorov-Smirnov test procedure for a completely specified discrete distribution. The test is exact and it can be used even when the sample sizes are small. Conover pointed out that the discrete Kolmogorov-Smirnov test statistic is not distribution free. For a review of goodness of fit for discrete data, see Horn [13] and the references therein. Horn in her review made the following conclusion: 'Very often in

Bootstrap BasedTests for GeneralizedNegativeBinomialDistribution

361

applications one has data in categories which can be ordered ... In such cases the exact discrete Kolmogorov-Smirnov test uses more of the information available in the data than does the X 2 test. Thus, its use is recommended instead of the X 2 test whenever ordered categories are present.' Baringhaus and Henze [1] used parametric bootstrap to develop an empirical generating function based goodness of fit test for the Poisson distribution. To apply the empirical generating function method, it is required that the probability generating function (pgf) defines the probability function uniquely. In the case of GNBD, we have two forms of pgf defined in (1.3) and (1.4). Thus, goodness of fit test for GNBD may not be based on the empirical generating function. The parametric bootstrap method employed by Baringhaus and Henze [1] was shown by Stute et al. [16] to provide an alternative approximation when the true quantiles of test statistics are tabulated. In situations where there are no tables, bootstrap constitutes the only possibility. Consul and Shenton [6, 7] showed that the distribution of the number of customers served before a single server queue first vanishes is that of GNBD. Good [11, 12] showed the application of the class of Lagrange probability distributions, which includes the GNBD, to branching processes. Yan [18] showed that the weight distribution obtained for condensed polymer chain in a polymerization reaction is that of GNBD. Other properties, estimation of parameters, and applications of GNBD can be found in Consul and Gupta [5], Famoye and Consul [9], Consul and Famoye [4], and the references contained in them. When a GNBD model is fitted to an observed data set, the goodness of fit of the model has to be assessed. The problems of goodness of fit of the GNBD model do not seem to have been investigated in the literature. In this paper, we provide different EDF statistics for testing the goodness of fit of GNBD model to an observed data set. In Section 2, we define the EDF test statistics for testing the goodness of fit of the GNBD model in (1.1). In Section 3, we present the results of a simulation study on the powers of the test under the null and the alternative hypotheses. The simulation study is based on the parametric bootstrap procedure which maintains a nominal level of significance very closely even when the sample size is small. Two numerical examples on the application of the EDF tests are provided in Section 4. In Section 5, we provide a recommendation for testing goodness of fit for the GNBD model.

2. Goodness of Fit Tests

Let a random sample of size n be taken from the GNBD model (1.1) and let nx, x = 0, 1, 2..... k be the observed frequencies for the different classes, where k is

362

F. Famoye

the largest of the observations. Thus k

n ~

Enx. x=0

An empirical distribution function (EDF) for the sample is defined as 1

x

F,(x) = - E n i , ni=o

x = 0,1,2,3,...,k.

(2.1)

Denote the cumulative distribution function for the GNBD model by

F(x;O,[3,m) = ~P(i;O, fl,m),

x>O,

(2.2)

i=0

where

P(i; O, [3, m)

is given by (1.1).

The EDF statistics are those that measure the discrepancy between Fn(x) in (2.1) and F(x; O, [3, m) in (2.2). The sample empirical distribution function F,(x) in (2.1) is expected to be close to the actual distribution function F(x; O, [3,m) in (2.2). If this is not the case, one suspects that the hypothetical distribution function F(x; O, [3, m) is not the correct model. The unknown parameters 0, [3, and m will be estimated b.y the method of moments. Jain and Consul [14] gave the moment estimates 0, [3, and rh of O, [3, and m as

= 1 - A / 2 + (A2/4

- 1) 1/2,

(2.3) (2.4)

and

(2.5) where A = - 2 + [~s 3 - 3($2)2]//[.~($2)3], and ~, s 2, and s 3 are respectively the sample mean, the sample variance, and the third sample moment about the mean. To test the hypotheses in (1.5), we define some measures of discrepancy analogous to the statistics defined for the continuous distributions. a. The Kolmogorov-Smirnov statistic Kd: