Likelihood-ratio tests for hidden Markov models Paolo Giudici1, Tobias Ryden2 and Pierre Vandekerkhove1 1
Department of Economics and Quantitative Methods University of Pavia Via San Felice 5 27100 Pavia, Italy 2 Department of Mathematical Statistics Lund University Box 118 221 00 Lund, Sweden
5 November 1998 Abstract We consider hidden Markov models as a versatile class of models for weakly dependent random phenomena. The topic of the present paper is likelihood-ratio testing for hidden Markov models, and we show that under appropriate conditions the standard asymptotic theory of likelihood-ratio tests is valid. Such tests are crucial in the speci cation of hidden Markov graphical Gaussian models, which we use to illustrate the applicability of our general results. Finally, our methodology is illustrated by means of a real data-set.
Running title. LR tests for HMMs. AMS 1991 classi cation. Primary 62M07, 62M09. Key words and phrases. Hidden multivariate Markov models, likelihood-ratio tests, temporal graphical models. Address for correspondence: Paolo Giudici, Dipartimento di Economia Politica e Metodi Quantitativi, Universita' di Pavia, Via san Felice 5, I-27100 Pavia. E-mail: pgiudicieco.unipv.it
1
1 Introduction Hidden Markov models (HMMs) is a versatile class of models for weakly dependent random phenomena. An HMM consists of two parts, a non-observable nite-state Markov chain fXk g, and an observable stochastic process fYk g. Given fXk g, the Y 's are conditionally independent with the conditional distribution of Yn depending on Xn only. Hence, Xn governs the distribution of Yn , and for this reason fXk g is sometimes called the regime. The word `hidden' is motivated by the non-observability of fXk g; inferences, predictions etc. must be carried out solely in terms of fYk g. HMMs have during the last decade become wide spread for modelling sequences of weakly dependent random variables, with applications in areas like speech processing (Rabiner, 1989), neurophysiology (Fredkin and Rice, 1992), biology (Leroux and Puterman, 1992) and nance (Ryden, Terasvirta and Asbrink, 1998). See also the recent monograph by MacDonald and Zucchini (1997). Commonly, the conditional distributions of Yn given Xn all belong to a single parametric family, such as the normal or Poisson families, so that Xn selects the parameter used to generate Yn . The distribution of Yn , i.e. the marginal distribution of fYk g, will then be a nite mixture from the parametric family. Mixtures are frequently used in i.i.d. settings to increase the dispersion governed by a speci c parametric family, and this eect is obviously found in the marginal distribution of an HMM as well. In addition, fYk g is dependent. HMMs can thus be viewed as an extension of Markov chains, but also as an extension of mixture models. The topic of the present paper is likelihood-ratio (LR) testing for HMMs. Drawing on results of Bickel, Ritov and Ryden (1998), we show that under certain conditions the standard asymptotic theory for such tests is valid, that is we arrive at a 2 distributional limit. This problem is particularly crucial in the detection of hidden Markov graphical Gaussian models, which will be considered to illustrate the wide applicability of our general results. In these highly multivariate models the natural problem is, when comparing dierent models, to test for zeros in the precision matrices of the mixture densities. The paper is organised as follows. In the next section we set the notation and present some necessary background material. In Sections 3 and 4 we present our result on the analysis of LR tests for HMMs. We then present, in Section 5, a brief review of graphical Gaussian models, so to introduce hidden Markov graphical Gaussian models in Section 6. Finally, Section 7 is dedicated to the illustration of our proposed methodology by means of a real data-set.
2
2 Preliminaries Before proceeding, we need to introduce some notation. We let fXk g1k=1 be a stationary Markov chain on f1; : : :; mg with transition probabilities (a; b) = P (Xk+1 = b j Xk = a). We also let fYk g be an Y -valued sequence such that given fXk g, fYk g is a sequence of conditionally independent random variables, Yn having (conditional) density g(yjXn ) with respect to some - nite measure on Y . Usually Y is a subset of Rq for some q, but it may also be a higher-dimensional space. Moreover, both f(a; b)g and fg(ja)g depend on a parameter #, that is (a; b) = #(a; b) and g(ja) = g# (ja), where # is to be estimated from a realisation of fYk g. The set to which # belongs is denoted by , and we assume Rd. Note that the stationary distribution of fXk g, denoted by f(a)gma=1, does also depend on #. The most common set-up is that where # contains the transition probabilities themselves, together with some parameters characterising the g's. In particular, it is often the case that g#(yja) = f (y; (a)) for some parametric family f (y; ). We refer to this case as the `usual parametrisation'. The joint density of (Y1; : : :; Yn ) may be compactly written as
p# (y1; : : : ; yn) = #
(Yn k=1
)
G# (yk )A# 1;
(1)
where A# = f#(a; b)g, G#(y) = diag(g#(yja)) and 1 is an m 1-vector of ones. The computational complexity of (1) is linear in n. The maximum likelihood estimator (MLE), denoted by #bn, maximizes p# (Y1; : : :; Yn ) over the parameter set . In many cases we may renumber the state space of fXk g and the g's, leaving the likelihood unchanged, and the MLE is then not unique. In particular we may do so if the usual parametrisation is employed. This ambiguity is obviously not a big concern, though. The true parameter is denoted by #0. We deliberately replace the subindex #0 by `0' in notation like P#0 (becoming P0) etc. Dierentiation with respect to # is denoted by dots, with one dot forming the gradient and two dots forming the Hessian. The following assumptions will be referred to in the sequel. A1. The transition probability matrix f0(a; b)g is ergodic, i.e. irreducible and aperiodic. A2. For all a and b, the map # 7! #(a; b) has two continuous derivatives in some neighbourhood G = f# : j# ? #0j < g of #0. Similarly for # 7! #(a). For all a and y 2 Y , the map # 7! g# (yja) has two continuous derivatives in the same neighbourhood.
3
A3. Write # = (#1; : : :; #d). There exists a > 0 such that (i) for all 1 i d and all a,
"
@ 2# E0 sup @# log g#(Y1ja) < 1; i j#?# j