Volatility Forecast Evaluation and Comparison Using Imperfect Volatility Proxies Andrew J. Patton∗ London School of Economics First version: March 2004. This version: 10 August, 2005. Preliminary. Comments Welcome.
Abstract
We show analytically that the use of a conditionally unbiased, but imperfect, volatility proxy can lead to undesirable outcomes in some commonly-used methods for evaluating and comparing conditional variance forecasts: the true conditional variance may be rejected as being sub-optimal, and an imperfect volatility forecast may be selected over the true conditional variance. We also consider the extent of the problem when more efficient volatility proxies, such as the intra-daily range or realised volatility, are used for forecast comparison. We derive necessary and sufficient conditions on the loss function for the ranking of competing volatility forecasts to be preserved when a volatility proxy is employed, and propose a new parametric family of loss function appropriate for volatility forecast comparison using an imperfect volatility proxy. Keywords: forecast evaluation, forecast comparison, loss functions, realised variance, range. J.E.L. Codes: C53, C52, C22. ∗
The author would particularly like to thank Peter Hansen, Ivana Komunjer and Asger Lunde for helpful sugges-
tions and comments. Thanks also go to Tony Hall, Mike McCracken, Adrian Pagan, Neil Shephard, Kevin Sheppard and seminar participants at the 2005 SNDE conference, the 2005 North American winter meetings of the Econometric Society, and the 2005 Global Finance conference. The author gratefully acknowledges financial support from the Leverhulme Trust under grant F/0004/AF. Some of the work on this paper was conducted while the author was a visiting scholar at the School of Finance and Economics, University of Technology, Sydney. Contact address: Financial Markets Group, London School of Economics, Houghton Street, London WC2A 2AE, United Kingdom. Email:
[email protected]. This paper is available from http://fmg.lse.ac.uk/∼patton/research.html.
1
1
Introduction
Given the central role that risk has in financial decision making it is no surprise that much effort has been devoted to developing volatility models. The profusion of models that have been proposed since Engle’s (1982) seminal ARCH paper leads naturally to the problem of evaluating the available volatility forecasting models1 . Evaluating and comparing economic forecasts is a well-studied problem, dating back at least to Theil (1958). However the evaluation and comparison of volatility forecasts, as opposed to other forecasts, is complicated by the fact that the variable of interest, the conditional variance, is not observable, even ex-post. This complication was resolved, at least partly, by recognising that the squared return on an asset at date t (assuming a zero mean return) is a conditionally unbiased estimator of the true unobserved conditional variance of the asset at date t. The high/low range and realised volatility, see Parkinson (1980) and Andersen, et al. (2003) for example, have also been used as volatility proxies. Many of the standard methods for forecast evaluation and comparison, such as the MincerZarnowitz (1969) regression and the Diebold-Mariano (1995) test, can be shown to be applicable when such a conditionally unbiased volatility proxy is used, see Andersen and Bollerslev (1998) and Hansen and Lunde (2004) for example. However, it is not true that using a conditionally unbiased volatility proxy will always lead to the same outcome as if the true conditional variance was used, as shown by Hansen and Lunde (2004). In particular, some of the modifications of standard methods employed by some authors can lead to perverse outcomes. For example, in the volatility forecasting literature numerous authors have expressed concern that a few extreme observations may have an unduly large impact on the outcomes of forecast evaluation and comparison tests, see Bollerslev and Ghysels (1994), Andersen, et al. (1999) and Poon and Granger (2003) amongst others. One common response to this concern is to employ forecast loss functions that are “less sensitive” to large observations than the usual squared forecast error loss function, such as absolute error or proportional error loss functions. In this paper we show analytically that such an approach can lead to incorrect inferences and the selection of inferior forecasts over better forecasts. Our research builds on work by Andersen and Bollerslev (1998) and Hansen and Lunde (2004), who were the first to analyse the problems introduced by the presence of noise in the volatility proxy. This paper is most closely related to the paper of Hansen and Lunde (2004), and we extend their 1
For recent surveys of the volatility forecasting literature, see Andersen, et al. (2005b), Bauwens, et al. (2004),
Poon and Granger (2003) and Shephard (2005). For recent surveys of the forecast evaluation literature see Clements (2005), Corradi and Swanson (2005) and West (2005).
2
work in two important directions: Firstly, we derive explicitly the undesirable outcomes that may arise when some common loss functions are employed, considering the three most commonly used volatility proxies: the daily squared return, the intra-daily range and a realised variance estimator. Secondly, we provide necessary and sufficient conditions on the loss function to ensure that the ranking of various forecasts is preserved when using a noisy volatility proxy. These conditions are related to those of Gourieroux, et al. (1984) for quasi-maximum likelihood estimation. The canonical problem in point forecasting is to find the forecast that minimises the expected loss, conditional on time t information. That is, ∗ ≡ arg min E [L (Yt+h , y) |Ft ] Yˆt+h,t
(1)
y∈Y
where Yt+h is the variable of interest, L is the forecast user’s loss function, Y is the set of possible forecasts, and Ft is the time t information set. Starting with the assumption that the forecast user is interested in the conditional variance, and that some noisy volatility proxy will be used in evaluation and comparison, we thus take the solution of the optimisation problem above (the conditional variance) as given, and consider the loss functions that will generate the desired solution. This approach is unusual in the economic forecasting literature: the more common approach is to take the forecast user’s loss function as given and derive the optimal forecast for that loss function; related papers here are Granger (1969), Engle (1993), Christoffersen and Diebold (1997), Christoffersen and Jacobs (2003) and Patton and Timmermann (2004), amongst others. The fact that we know the forecast user desires a variance forecast places limits on the class of loss functions that may be used for volatility comparison, ruling out some choices previously used in the literature. However we show that the class of “appropriate” loss functions still admits a wide variety of loss functions, allowing much flexibility in representing volatility forecast users’ preferences. In essence, this paper identifies an internal inconsistency in some previous research on volatility forecasting. The stated goal of forecasting the conditional variance is not consistent with the use of some regressions and some loss functions when an imperfect volatility proxy is employed. Of course, these loss functions are not themselves inherently invalid or inappropriate: if the forecast user’s preferences are indeed described by an “inappropriate” loss function, then this simply implies that the object of interest to that forecast user is not the conditional variance but rather some other quantity2 . If the object of interest to the forecast user is known to be the conditional variance 2
For example, the utility of realised returns on a portfolio formed using a volatility forecast, see West, et al.
(1993) for example, defines an economically meaningful loss function, even though the optimal forecast under that loss function will not in general be the true conditional variance.
3
then this paper outlines tests for forecast evaluation and comparison that are applicable when an imperfect volatility proxy is employed. In the presence of non-normally distributed returns the use of variance as a measure of risk has been called into question. Other risk measures, such as Value-at-Risk (VaR) and Expected Shortfoll (or “conditional VaR”), have been suggested in the literature, see Duffie and Pan (1997) or McNeil and Frey (2000) for example, as more relevant measures of risk. More recently, measures of risk based on the underlying return diffusion, such as integrated variance, have gained interest. While we acknowledge the importance of the question of an appropriate risk measure, we take as given the fact that there is some interest in forecasts of conditional variance, and thus a derived demand in methods for evaluating and comparing forecasts of the conditional variance. Some of the results of this paper can be adapted for use with other measures of risk, though we do not pursue these extensions here. In Sections 2 and 3 we derive expressions for the optimal forecast obtained by minimising the expected loss for various loss functions using the daily squared return, range, or realised volatility as a proxy for the conditional variance. The optimal forecasts are functions of conditional moments or quantiles of the volatility proxy employed. We do not consider the problem of building econometric models, such as GARCH, stochastic volatility, CAViaR, etc, for these quantities. Of course, even a forecast that is equal to the appropriate conditional moment/quantile (in our case, the conditional variance) can perform poorly if the model for the conditional moment/quantile is mis-specified, or if there is substantial estimation error. The remainder of this paper is as follows. In Section 2 we consider volatility forecast evaluation and comparison using the squared return as a volatility proxy, showing the problems that arise when using alternative Mincer-Zarnowitz regressions or Diebold-Mariano tests with alternative loss functions. In Section 3 we consider the same problem using the range and realised volatility as volatility proxies. In Section 4 we provide necessary and sufficient conditions that a loss function must satisfy so that perverse outcomes are avoided. We also present a parametric family of loss functions appropriate for volatility forecast comparison, which nests two of the most widely-used loss functions in the literature, namely the MSE and QLIKE loss functions. In Section 5 we present an illustration using three commonly-used volatility models, and in Section 6 we conclude and suggest extensions. All proofs and derivations are provided in appendices.
4
1.1
Notation
Let rt be the variable whose conditional variance is of interest, usually a daily or monthly asset return in the volatility forecasting literature. Let the information set available be denoted Ft−1 , which is assumed to contain all relevant information available at time t−1, and denote V [rt |Ft−1 ] ≡ £ ¤ Vt−1 [rt ] ≡ σ 2t . We will assume throughout that E [rt |Ft−1 ] ≡ Et−1 [rt ] = 0, and so σ 2t = Et−1 rt2 . ¡ ¢ ¡ ¢ We will cover the general case that rt |Ft−1 ∼ Ft 0, σ 2t , where Ft 0, σ 2t is some conditional ¡ ¢ distribution with mean zero and variance σ 2t , and then specialise to the case that rt |Ft−1 ∼ N 0, σ 2t
to obtain specific results in certain cases. Let εt ≡ rt /σ t denote the ‘standardised return’. Let a
forecast of the conditional variance of rt be denoted ht , or hi,t if there is more than one forecast under analysis. We will take forecasts as “primitive”, and not consider the specific models and estimators that may have generated the forecasts. The loss function of the forecast user is L : R+ × H → R+ , where the first argument of L is σ 2t or some proxy for σ 2t , denoted σ ˆ 2t , and the second is ht . R+ and
R++ denote the non-negative and positive parts of the real line respectively, and H is a compact
subset of R++ . Commonly used volatility proxies are the squared return, rt2 , realised volatility, RVt , and the range, RGt . Optimal forecasts will be denoted h∗t .
2
Forecast evaluation and comparison using squared returns
In this section we will focus on the use of daily squared returns for volatility forecast evaluation. In Section 3 we will examine the use of “realised volatility” and the range.
2.1
Volatility forecast evaluation using a Mincer-Zarnowitz regression
In volatility forecast evaluation we would ideally like to test the null hypothesis H0 : ht = σ 2t ∀ t vs. Ha : ht 6= σ 2t for some t
(2) (3)
That is, we would like to test that a given series of forecasts ht is equal to the series of true conditional variances σ 2t at every point in time. Instead, a necessary condition of this hypothesis is usually tested: one common method of evaluating a forecast is the Mincer-Zarnowitz (1969), or MZ, regression (also see Theil, 1958), which involves regressing the realisation of the variable of interest on the forecast. As the conditional variance is never observed, the usual MZ regression is infeasible for volatility forecast evaluation. However, if we have a conditionally unbiased estimator
5
of the conditional variance then the feasible MZ regression σ ˆ 2t = β 0 + β 1 ht + et H00
:
β0 = 0 ∩ β1 = 1
vs. Ha0
:
β 0 6= 0 ∪ β 1 6= 1
yields unbiased estimates of β 0 and β 1 .3 The OLS parameter estimates will be less accurately esti¢ ¡ ˆ 2t , leading some authors, such as Andersen and Bollerslev mated the larger the variance of σ 2t − σ
(1998), to suggest the use of high frequency data to construct more accurate volatility proxies. The
less accurate estimates of β 0 and β 1 affect the power of the test to detect deviations from forecast optimality but do not affect the validity of the test. The use of squared returns in MZ regressions caused some researchers concern, as the estimation problem relies on fourth powers of the returns, and thus returns that are large in absolute value have a very large impact on the parameter estimates. Some authors, see Jorion (1995) and Bollerslev and Wright (2001) for example, have proposed taking some transformation of the volatility proxy to reduce the impact of large returns. Two such examples are: p |rt | = β 0 + β 1 ht + et , and ¡ ¢ log rt2 = β 0 + β 1 log (ht ) + et
(4) (5)
Take the regression in equation (4) as an example: under H0 the population values of the OLS parameter estimates are easily shown to be: β0 = 0 ⎧ ⎪ ⎪ E [|ε |] , if Et−1 [|εt |] is constant ⎪ ⎨ qt−1 t¡ ¡ ¢ ¢ ¡ ¢ ν−2 ν−1 β1 = /Γ ν2 , if rt |Ft−1 ∼ Student’s t 0, σ 2t , ν , ν > 2 π Γ 2 ⎪ ⎪ p ¡ ¢ ⎪ ⎩ 2/π ≈ 0.80, if rt |Ft−1 ∼ N 0, σ 2t
¡ ¢ where Student’s t 0, σ 2t , ν is a Student’s t distribution with mean zero, variance σ 2t and ν degrees of freedom. When returns have the Student’s t distribution, the population value for β 1 decreases
towards zero as ν ↓ 2, indicating that excess kurtosis in returns increases the distortion in this 3
Note that the residuals of this regression will usually be serially correlated and heteroskedastic, so robust standard
errors, such as those of Newey and West (1987), are required.
6
parameter. In the regression in equation (5) the population OLS parameters are: ⎧ £ ¤ 2 ⎪ ⎪ ⎨ E log εt , ¡ ¢ ¡ ¢ ¡ ¢ β0 = log (ν − 2) + Ψ 12 − Ψ ν2 , if rt |Ft−1 ∼ Student’s t 0, σ 2t , ν , ν > 2 ⎪ ⎪ ¡ ¢ ⎩ if rt |Ft−1 ∼ N 0, σ 2t − log (2) − γ E ≈ −1.27, β1 = 1
where Ψ is the digamma function and γ E = −Ψ (1) ≈ 0.58 is Euler’s constant, see Harvey, et al. (1994). Under the Student’s t distribution, the above expression shows β 0 → −∞ as ν ↓ 2.4
Thus while both of these alternative MZ regressions may initally appear reasonable, without some modification they lead to the undesirable outcome that the perfect volatility forecast, ht = σ 2t ∀ t, will be rejected with probability approaching one as the sample size increases. In both of the above cases the perverse outcomes above are the result of the fact that unbiasedness is not invariant to nonlinear transformations5 . Given a conditionally unbiased volatility proxy, the null and alternate hypotheses in equations (2) and (3) can be equivalently written as: £ 2¤ ˆ t ∀t H0 : ht = Et−1 σ £ 2¤ vs. Ha : ht 6= Et−1 σ ˆ t for some t
The standard MZ regression tests just one simple implication of H0 . Of course, H0 also implies £ 2¤ ˆt ∀ t gt−1 (Wt−1 ) ht = gt−1 (Wt−1 ) Et−1 σ
for any (kt−1 × 1) vector Wt−1 ∈ Ft−1 and any sequence of known functions gt−1 : Rkt−1 → R/ {0}. Judicious choices of gt−1 may lead to greater power and/or better finite-sample size properties of tests. One simple choice, see Engle and Patton (2001), is gt−1 (Wt−1 ) = h−1 t . This choice corresponds to estimating the standard MZ regression by weighted least squares, using the volatility forecasts as weights, and yields residuals that are serially uncorrelated under H0 . The standard MZ regression can detect certain deviations from H0 , but is not consistent against all possible deviations. A consistent MZ test could be constructed using the methods of Bierens (1990), de Jong (1996) and Bierens and Ploberger (1997). 4
Christodoulakis and Satchell (2004) study the regression in equation (5) under a number of different assumptions
for the distribution of returns and volatility processes, and characterise the MSFE in each case. 5 Bollerslev and Wright (2001), Christodoulakis and Satchell (2004) and Andersen, et al. (2005) suggest adjusting the volatility proxy, either exactly or approximately, so as to make them unbiased for their quantities of interest, σt or log σt , and so this problem does not arise in their studies.
7
2.2
Volatility forecast comparison using a Diebold-Mariano test
The most widely-used test for forecast comparison is the Diebold and Mariano (1995) test. If we ¡ ¢ define ui,t ≡ L σ 2t , hi,t , where L is the forecast user’s loss function, and let dt = u1,t − u2,t , then
the DM test of equal predictive accuracy can be conducted as a simple t-test that H0 : E [dt ] = 0
(6)
vs. Ha : E [dt ] 6= 0 Like the MZ regression, the DM test may be used in conjunction with volatility proxy for certain loss functions. For other loss functions, using a volatility proxy in a DM test can lead to undesirable outcomes. To determine the applicability of a specific loss function in a DM test for volatility forecast comparison consider the condition of Hansen and Lunde (2004): Condition 1 (Hansen and Lunde) The ranking of any two (possibly imperfect) volatility forecasts by expected loss using the chosen loss function is the same whether the ranking is done using the true conditional variance or some conditionally unbiased volatility proxy. Hansen and Lunde (2004) show that a sufficient condition for their condition to hold is that ¡ 2 ¢ ¡ 2 ¢2 ˆ t does not depend on ht . Notice that the above condition implies that the true ˆ t , ht /∂ σ ∂2L σ
conditional variance is the optimal forecast for the chosen loss function. Below we check whether this necessary condition holds for some common loss functions. Under squared-error loss, also known as MSE loss, one can easily show that the optimal forecast £ ¤ is the conditional variance: h∗t = Et−1 rt2 = σ 2t . Thus a Diebold-Mariano comparison of the true
conditional variance with any other volatility forecast, using rt2 as a volatility proxy and MSE as the loss function, will lead to the selection of the true conditional variance, subject to sampling
variability. Further, it is clear that the MSE loss function also satisfies the sufficient condition of Hansen and Lunde (2004). One common response to the concern that a few extreme observations drive the results of volatility forecast comparison studies is to employ alternative measures of forecast accuracy, see Pagan and Schwert (1990), Bollerslev and Ghysels (1994), Diebold and Lopez (1996), Andersen, et al. (1999), Hansen and Lunde (2001) and Poon and Granger (2003), for example. The justification offered for such an approach is usually that the squared-error loss function over-emphasises the largest observations, placing excess weight on the largest observations. A collection of loss functions employed in the literature on volatility forecast evaluation and comparison is presented below. 8
Some of these loss functions are called different names by different authors: MSE-prop is also known as “heteroskedasticity-adjusted MSE (HMSE)”; MAE-prop is also known as “mean absolute percentage error (MAPE)” or as “heteroskedasticity-adjusted MAE (HMAE)”. ¢2 ¡ 2 ¢ ¡ 2 ˆ t − ht M SE : L σ ˆ t , ht = σ ¡ 2 ¢ σ ˆ2 QLIKE : L σ ˆ t , ht = log ht + t ht ¢2 ¡ 2 ¢ ¡ 2 ˆ t − log ht M SE-LOG : L σ ˆ t , ht = log σ p ´2 ¡ 2 ¢ ³ M SE-SD : L σ ˆ t , ht = σ ˆ t − ht ¶2 µ 2 ¡ 2 ¢ σ ˆt M SE-prop : L σ ˆ t , ht = −1 ht ¯ ¡ 2 ¢ ¯ 2 M AE : L σ ˆ t , ht = ¯σ ˆ t − ht ¯ p ¯¯ ¡ 2 ¢ ¯¯ ˆ t − ht ¯ M AE-SD : L σ ˆ t , ht = ¯σ ¯ 2 ¯ ¯ ¡ 2 ¢ ¯σ ˆ t M AE-prop : L σ ˆ t , ht = ¯¯ − 1¯¯ ht
(7) (8) (9) (10) (11) (12) (13) (14)
Consider the MAE loss function from above. As usual with an absolute-error loss function we obtain the median as the optimal forecast: £ ¤ h∗t = M ediant−1 rt2
£ ¤ = σ 2t · M ediant−1 ε2t ⎧ ¡ ¢ ⎨ σ 2 · ν−2 · M edian [F ] , if rt |Ft−1 ∼ Student’s t 0, σ 2t , ν 1,ν t ν = ⎩ σ 2 · M edian £χ2 ¤ ≈ 0.45σ 2 , if rt |Ft−1 ∼ N ¡0, σ 2 ¢ t t t 1
(15)
£ ¤ where M ediant−1 rt2 is the conditional median of rt2 given Ωt−1 . Thus if we compare a forecast
which is exactly equal to σ 2t for all t to one that is equal to 0.45σ 2t for all t, using the squared daily
return as a proxy for the conditional variance, we will usually conclude that the perfect forecast is inferior to the one which is wrong by more than a factor of 2. Figure 1 shows that if returns have a Student’s t distribution then the degree of distortion is even larger. Another commonly used loss function is the MSE loss function on standard deviations rather than variances, see equation (10). The motivation for this loss function is that taking square root of the two arguments of the squared-error loss function shrinks the larger values towards zero, reducing the impact of the most extreme values of rt . However it also leads to an incorrect volatility forecast
9
being selected as optimal:
∙³ ¸ √ ´2 ≡ arg min Et−1 |rt | − h h∈H ∙³ ¸¯ √ ´2 ¯ ∂ ¯ Et−1 |rt | − h FOC 0 = ¯ ∗ ∂h h∗t
h=ht
so
h∗t
2
= (Et−1 [|rt |])
(16)
= σ 2t (Et−1 [|εt |])2 ⎧ ¡ ¢ ⎨ ν−2 ¡Γ ¡ ν−1 ¢ /Γ ¡ ν ¢¢2 σ 2 , if r |F 2 t t−1 ∼ Student’s t 0, σ t , ν , ν > 2 t π 2 2 = ¡ ¢ 2 ⎩ 2 σ 2 ≈ 0.64σ 2 , if r |F ∼ N 0, σ t t−1 t t t π
(17)
For this loss function it is also true that excess kurtosis in asset returns exacerbates the distortion, which we can see in Figure 2 for returns that have the Student’s t distribution. In the appendix we provide the corresponding calculations for the remaining loss functions in equations (7) to (14) above, and summarise the results in Table 1. Table 1 shows that the degree of distortion in the optimal forecast according to some of the loss functions used in the literature can be substantial. Under normality the optimal forecast under these loss functions ranges from about one-third of the true conditional variance to three times the true conditional variance. If returns exhibit conditional kurtosis then the range of optimal forecasts from these loss functions is even wider. Table 1 shows that using certain loss functions in Diebold-Mariano comparisons along with daily squared returns as a proxy for the true conditional volatility may lead to the perverse outcome that a competing variance forecast is selected rather than the true conditional variance. To illustrate and emphasize the importance of this point, consider the following example. ¡ ¢ Example 1: Assume that rt |Ft−1 ∼ N 0, σ 2t , and that σ 2t follows a simple GARCH(1,1)
2 , subject to ω > 0 and 1 − β 2 − 2αβ − 3α2 > 0 (which is required process: σ 2t = ω + βσ 2t−1 + αrt−1 £ ¤ ˆ 2t = rt2 , let L be the MSE-SD loss function, and let h1t = σ 2t and for E σ 4t to exist). Let σ
h2t = 2/πσ 2t . Let n denote the number of observations available for conducting the test. Then the
Diebold-Mariano test statistic evaluated at population moments is: q ´2 √ ³ n 1 − π2 DM0 = r³ q q ´4 q ´ ³ 1−(α+β)2 8 2 12 2 5 − 12 π2 + 12 + − − 1 − π π π π π 2 1−(α+β)2 −2α2 √ ≈ 0.1632 n, when β = 0.9 and α = 0.05 The proof is in Appendix 1. For the specific case that [α, β] = [0.05, 0.9], which is reasonable for daily asset returns, the DM0 statistic is greater than 1.96 for sample sizes larger than 145. Thus 10
with less than a year’s worth of daily data, we would expect to reject the true conditional variance in favour of a volatility forecast equal to around 0.64 times the true conditional variance. This example shows that choosing an inappropriate loss function for volatility forecast comparison can have important empirical implications in realistic situations. The sources of the mis-matches between the optimal forecast for a given loss function and the true conditional variance are easily identified: the last three loss functions move from considering mean squared losses to considering mean absolute losses, which then change the solution of the optimisation problem from an expectation to a median. In the third and fourth cases the distortion follows from the fact that the unbiasedness property is not invariant to nonlinear transformations. This is a relatively easy problem to remedy in practise: one needs to find a conditionally unbiased estimator of the quantity of interest (σ t , log σ 2t , etc) either exactly or approximately, see Bollerslev and Wright (2001) and Andersen, et al. (2003) for example.
3
Using better volatility proxies
It has long been known that squared returns are a quite noisy proxy for the true conditional variance. One alternative volatility proxy that has gained much attention recently is “realized volatility”, see Andersen, et al. (2001a, 2003), and Barndorf-Neilssen and Shephard (2002, 2004). Another commonly-used alternative to squared returns is the intra-daily range. It is well-known that if the log stock price follows a Brownian motion then both of these estimators are unbiased and more efficient than the squared return. In this section we obtain the rate at which the distortion in the ranking of alternative forecasts disappears when using realised volatility as the proxy, as the sampling frequency increases, for a simple data generating process (DGP). These results can be viewed as complements to that of Hansen and Lunde (2004), who showed that under certain conditions the degree of distortion in ranking alternative forecasts is increasing in the variability of the proxy error. Assume that there are m equally-spaced observations per trade day, and let ri,m,t denote the ith
intra-daily return on day t. In order to obtain simple analytical results for problems involving
the range as a volatility proxy we consider only a simple DGP: zero mean return, no jumps, and constant conditional volatility within a trade day6 . Of course we could obtain approximate results 6
Analytical and empirical results on the range and “realised range” under more flexible DGPs are presented in
two recent working papers by Christensen and Podolskij (2005) and Martens and van Dijk (2005).
11
via simulation for more realistic DGPs but we do not attempt this here. Let rt = d ln Pt = σ t dWt στ
= σ t ∀τ ∈ (t − 1, t]
ri,m,t ≡
i/m Z
(19)
rτ dτ = σ t
(i−1)/m
so {ri,m,t }m i=1
(18)
µ ¶ σ 2t ∼ iid N 0, m
i/m Z
dWτ
(20)
(i−1)/m
(21)
The “realised volatility” or “realised variance” is defined as: RVt ≡
m X
2 ri,m,t
i=1
Realised variance, like the daily squared return (which is obtained in the above framework by setting m = 1), is a conditionally unbiased estimator of the daily conditional variance. Its main advantage is that it is a more efficient estimator of the daily conditional variance than the daily squared £ ¤ return: for this DGP it can be shown that M SEt−1 rt2 = 2σ 4t while M SEt−1 [RVt ] = 2σ 4t /m. Intra-daily returns are known to exhibit time-varying volatility, serial correlation, diurnality and
non-normality, see Bai, et al. (2001) for example. The presence of these features may attenuate some of the benefits of using high frequency data for volatility forecast evaluation, and so the improvements from using RVt presented below may over-estimate the actual improvements one may obtain when using high frequency data. A volatility proxy that pre-dates realised volatility by many years is the range, or the high/low, estimator, see Parkinson (1980), Garman and Klass (1980) and Ball and Torous (1984). Alizadeh, et al. (2002) use the fact that the range is widely available for long series and is more efficient than squared returns to improve the estimation of stochastic volatility models. The intra-daily log range is defined as: RGt ≡ max log Pτ − min log Pτ , t − 1 < τ ≤ t τ
τ
(22)
Under the dynamics in equation (18) Feller (1951) presented the density of RGt , and Parkinson (1980) presented a formula for obtaining moments of the range, which enable us to compute: £ ¤ Et−1 RG2t = log (16) · σ 2t ≈ 2.7726σ 2t
(23)
Details on the distributional properties of the range under this DGP are presented in Appendix 2. The above expression shows that squared range is not a conditionally unbiased estimator of σ 2t . 12
Most authors, see Parkinson (1980) and Alizadeh, et al. (2002) for example, who employ the range as a volatility proxy are aware of this and scale the range accordingly. We will thus focus below on the adjusted range: RGt ≈ 0.6006RGt RG∗t ≡ p log (16)
(24)
which, when squared, is an unbiased proxy for the conditional variance. Using the results of £ ¤ ≈ 0.4073σ 4t , which is approximately Parkinson (1980) it is simple to determine that M SEt−1 RG∗2 t one-fifth of the MSE of the daily squared return, and so using the range yields an estimator as
accurate as a realised volatility estimator constructed using 5 intra-daily observations. This roughly corresponds to the comment of Andersen and Bollerslev (1998, footnote 20) that the adjusted range yields an MSE comparable to the MSE of realised volatilities constructed using 2 to 3 hour returns. We now determine the optimal forecasts obtained using the various loss functions considered 2 ˆ 2t = RG∗2 above, when σ ˆ 2t = RVt or σ t is used as a proxy for the conditional variance rather than rt .
We initially leave m unspecified for the realised volatility proxy, and then specialise to three cases: m = 1, 12 and 78, corresponding to the use of daily, half-hourly and 5-minute returns, on a stock listed on the New York Stock Exchange (NYSE). For MSE and QLIKE the optimal forecast is simply the conditional mean of σ ˆ 2t , which equals the conditional variance, as RVt and RG∗2 t are both conditionally unbiased. The MSE-SD loss σ t ])2 as an optimal forecast. Under the set-up introduced above, function yields (Et−1 [ˆ RVt ≡ so
mσ −2 t RVt
∼
m X
i=1 χ2m σ 2t ³
2 rt,i
m σ 2t X 2 = εt,i m i=1
hp i´2 E so h∗t = χ2m m hp i √ 1 E χ2m ≈ m − √ by a Taylor series approximation 4 m µ ¶ 1 1 so h∗t ≈ σ 2t 1 − + 2m 16m2 ⎧ 2 ⎪ ⎪ ⎨ 0.5625 · σ t for m = 1 ≈ 0.9588 · σ 2t for m = 12 ⎪ ⎪ ⎩ 0.9936 · σ 2t for m = 78
The results for the MSE-SD loss function using realised volatility show that reducing the variance of the volatility proxy improves the optimal forecast, consistent with Hansen and Lunde (2004).7 Using the range we find that 7
Note that the result for m = 1 is different to that obtained in Section 2, which was h∗t =
13
2 2 π σt
≈ 0.6366σ2t . This
h∗t = (Et−1 [RG∗t ])2 =
2 σ 2 ≈ 0.9184σ 2t π log 2 t
and so the distortion from using the range is approximately equal to that incurred when using a realised volatility constructed using 6 intra-daily observations.
£ 2¤ ˆ t as the optimal forecast. For Consider now the MAE loss function, which yields M ediant−1 σ
realised volatility we thus have
h∗t =
£ ¤ 1 M edian χ2m σ 2t m
£ ¤ For large m, M edian χ2m ≈ m−2/3, though most software packages have functions for the inverse £ ¤ cdf of a χ2m distribution. For small m the approximation M edian χ2m ≈ m − 2/3 + 1/ (9m) is more accurate. Thus µ ¶ 2 1 ∗ 1− + ht ≈ σ 2t 3m 9m2 ⎧ 2 ⎪ ⎪ ⎨ 0.4444 · σ t for m = 1 ≈ 0.9452 · σ 2t for m = 12 ⎪ ⎪ ⎩ 0.9915 · σ 2t for m = 78
£ ¤ using M edian χ2m ≈ m − 2/3 + 1/ (9m)
For the range we have
h∗t ≈
2.2938 2 σ = 0.8273σ 2t log 16 t
which is equivalent to using about 4 observations to construct the realised volatility proxy. Calculations for the remaining loss functions are collected in Appendix 2, and the results are summarised in Table 2. These results confirm that as the proxy used to measure the true conditional variance gets more efficient the degree of distortion decreases for all loss functions. When using RVt as a volatility proxy we find that h∗t = σ 2t for MSE and QLIKE ¡ ¢ h∗t = σ 2t + O m−1 for MSE-prop, MAE and MAE-SD ¡ ¢ h∗t ≈ σ 2t + O m−1 for MSE-SD and MAE-prop ¡ ¢ log h∗t ≈ log σ 2t + O m−1 for MSE-LOG
Across loss functions we found that the range was generally approximately as good a volatility proxy as the realised volatility estimator constructed with between 4 and 6 intra-daily observations. is because for m = 1 we can obtain the expression exactly, using results for the normal distribution, whereas for arbitrary m we relied on a second-order Taylor series approximation.
14
Let us now generalise the above results to a broad class of arbitrary loss functions. To do so we will make use of Taylor series approximations, and thus will require some differentiability assumptions on the loss function, which are not satisfied for some of the loss functions considered above. Assumption T1: The volatility proxy satisfies:
¢ √ −1/2 ¡ 2 mVt σ ˆ t,m − σ 2t →D N (0, 1)
Assumption T2: The loss function L is three times differentiable. ¢ ¡ Assumption T3: The loss function L is such that L σ 2 , h = 0 iff h = σ 2 ¡ ¢ Assumption T4: The loss function L is such that ∂L σ 2 , h ∂h T 0 if σ 2 S h
¢ ¡ 2 Assumption T5: The volatility proxy and the loss function are such that L σ ˆ t,m , h − ¢ ¡ L σ 2t , h →p 0 as m → ∞ uniformly on H.
Proposition 1 (i) Let assumptions T1-T4 hold, and define
£ ¡ 2 ¢¤ ˆ t,m , h h∗t,m ≡ arg min Et−1 L σ h∈H
Then
¡ ¢ ∂ 3L σ2, h ∂ (σ 2 )2 ∂h
¢ ¡ T 0 for all σ 2 , h ⇒ h∗t,m S σ 2t
(ii) Let assumptions T3-T5 hold. Then h∗t,m →p σ 2t as m → ∞.
The first part of the above proposition shows that it is the sign of the third derivative of the loss function that determines whether the optimal forecast is above, below or equal to the true conditional variance. The case that this third derivative is equal to zero, and thus that the optimal forecast is the conditional variance, corresponds to a result of Hansen and Lunde (2004). The second part of the above proposition shows that under the high-level assumption of uniform convergence ¢ ¡ ¢ ¡ 2 of L σ ˆ t,m , h to L σ 2t , h , the optimal forecast converges to the conditional variance as m → ∞. Thus even loss functions that cause distortions in the presence of noise in the volatility proxy can
generate optimal forecasts that are consistent for the conditional variance. To the extent that the noise in the proxy is “small” one might treat the distortion also as “small”.
4
A class of appropriate loss functions
In the previous section we showed that amongst eight loss functions commonly used to compare £ 2¤ ˆ t = σ 2t from volatility forecasts, only the MSE and the QLIKE loss functions lead to h∗t = Et−1 σ the first-order condition. This prompts the question of whether there exist other loss functions
that yield the conditional variance as the optimal forecast. The following proposition provides a 15
necessary and sufficient class of such loss functions, which are closely related to the class of linearexponential densities of Gourieroux, et al. (1984), and to the work of Gourieroux, et al. (1987). We make the following assumptions: £ 2¤ ˆ t = σ 2t A1: Et−1 σ A2: σ ˆ 2t |Ft−1 ∼ Ft ∈ F˜ , the set of all absolutely continuous distribution functions on R+ .
A3: L is twice continuously differentiable with respect to h and σ ˆ2. £ 2¤ ˆ t , where H is a compact subset of A4: There exists some h∗t ∈ int (H) such that h∗t = Et−1 σ
R++ .
¯ £ ¡ 2 ¢¤ £ ¡ 2 2¢ ¤¯ ˆ t , h < ∞ for some h ∈ H; (b) ¯Et−1 ∂L σ ˆ t , σ t /∂h ¯ A5: L and Ft are such that: (a) Et−1 L σ ¯ ¡ 2 2¢ £ ¤¯ < ∞; and (c) ¯Et−1 ∂ 2 L σ ˆ t , σ t /∂h2 ¯ < ∞ for all t. Proposition 2 Let assumptions A1 to A5 hold. Then the forecast that minimises expected loss is
ˆ 2t is used rather than σ 2t , and the ranking of competing forecasts by expected loss is preserved when σ σ 2t , if and only if the loss function L is of the form: ¡ 2 ¢ ¡ 2¢ ¢ ¡ 2 L σ ˆ , h = C˜ (h) + B σ ˆ + C (h) σ ˆ −h
(25)
where B and C are twice continuously differentiable, C is a strictly decreasing function on H, and C˜ is the anti-derivative of C. The general representation of “appropriate” loss functions in Proposition 2 provides a simple means of determining whether a given loss function is suitable for use in volatility forecast comparison, but it does not directly provide new alternative, and “appropriate”, loss functions. To this end, we now seek to find a parametric family of loss functions, that is a member of the class proposed above, and which nests MSE and QLIKE as special cases. We do this by noting that the first-order conditions from MSE and QLIKE loss functions are both of the form: ¡ 2 ¢ ¢ ¡ 2 ∂L σ ˆ ,h = 0 = ahb σ ˆ − h , a < 0, b ∈ R ∂h
(26)
ie, A0 (h) = −ahb+1 and C 0 (h) = ahb . Since a is just a scaling parameter we set it to −1 without loss of generality. Integrating the above expression with respect to Z Z ¡ 2 ¢ b+1 2 L σ ˆ ,h = h dh − σ ˆ hb dh ⎧ ¡ 2¢ 1 1 ⎪ σ ˆ 2 hb+1 + b+2 hb+2 , ˆ − b+1 ⎪ ⎨ B1 ¡σ ¢ = ˆ2 − σ ˆ 2 log h + h, B2 σ ⎪ ⎪ ¡ ¢ 2 ⎩ ˆ 2 + σˆh + log h, B3 σ 16
h yields:
b∈ / {−1, −2} b = −1 b = −2
where Bi is some function of σ ˆ 2 , i = 1, 2, 3. The functions above yield the desired FOC, but loss functions are usually constrained to exhibit certain properties, such as having zero loss for a zero forecast error. We can use this property to restrict the set of possible functions B1 , B2 and B3 . We do so and obtain the following family of loss functions: Proposition 3 The following family of functions ¡ 2 ¢ L σ ˆ , h; b =
⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩
1 1 σ 2b+4 − hb+2 ) − b+1 hb+1 (b+1)(b+2) (ˆ 2 h−σ ˆ2 + σ ˆ 2 log σˆh , σ ˆ2 σ ˆ2 h − log h − 1,
¢ ¡ 2 / {−1, −2} σ ˆ − h , for b ∈ for b = −1
(27)
for b = −2
satisfy L (h, h; b) = 0 for all h ∈ H, and are of the form in Proposition 2. The MSE loss function is obtained when b = 0 and the QLIKE loss function is obtained when b → −2, up to additive and multiplicative constants. In Figure 3 we present the above class of functions for various values of b, ranging from 1 to −5, and including the MSE and QLIKE cases. This figure shows that this family of loss functions can take a wide variety of shapes, ranging from symmetric (b = 0, corresponding to the MSE loss function) to asymmetric, with heavier penalty either on underprediction (b < 0) or overprediction (b > 0). Figure 4 plots the ratio of losses incurred for negative forecast errors to those incurred for positive forecast errors, to make clearer the form of asymmetries in these loss functions. Having presented a new class of loss functions, it is next of interest to establish the conditions under which we can employ these loss functions in Diebold-Mariano tests for volatility forecast comparison. The main conditions to be determined are moment conditions on the volatility proxy and volatility forecasts, and these are presented in part (ii) of the following proposition. Proposition 4 (i) For a given loss function parameter b, and given that 1. (a) dt (b) = d0 (b) + εt (b), t = 1, 2, ...; d0 (b) ∈ R, (b) {dt (b)} is a mixing sequence with either φ of size −r/2 (r − 1) for some r ≥ 2, or α of size −r/ (r − 2) for some r > 2, (c) E [dt (b)] = d0 (b) for t = 1, 2, ..., (d) E [|dt (b)|r ] < ∆ < ∞ for all t, and £ ¤ P (e) Vn (b) ≡ V n−1/2 nt=1 εt (b) is uniformly positive definite. 17
Then
¢ √ ¡¯ n d (b) − d0 (b) p →D N (0, 1) , as n → ∞ Vn (b)
P where d¯n (b) ≡ n−1 nt=1 dt (b). Under H0 : E [dt (b)] = 0, we have: √ ¯ ndn (b) D DMn (b) ≡ q £√ ¤ → N (0, 1) as n → ∞ nd¯n (b) Vˆ
¤ £√ ¤ £√ nd¯n (b) is any consistent estimator of V nd¯n (b) . If E [dt (b)] 6= 0 then DMn (b) → where Vˆ
±∞.
i h (ii) Sufficient conditions for E dt (b)2 < ∞ are
1. inf t hit ≡ ci > 0 for i = 1, 2, 2. E [hpit ] < ∞, i = 1, 2, and 3. E [ˆ σ qit ] < ∞, where p and q are as follows: p = max [0, 2b + 4] , q = max [4 + δ, 4b + 8] , for δ > 0, when b ∈ / {−1, −2} p = 2 (e + 1) /e ≈ 2.74, q = 4 (e + 1) /e ≈ 5.47,
when b = −1
p = 2/e + δ ≈ 0.74 + δ, q = 4 + δ, for δ > 0,
when b = −2
where e is the exponential constant, e ≈ 2.71. The assumption that the volatility forecasts will never be less than some positive threshold is true for many standard volatility models, such as the GARCH(1,1), for example. Part (ii) of the above proposition show how greatly the moment conditions can vary depending on the choice of £ 8¤ £ ¤ ˆ loss function shape parameter b. For MSE loss, corresponding to b = 0, we need E h4it and E σ i h i t h 2/e+δ and E σ ˆ 4+δ , for to be finite, whereas for the QLIKE loss function we only require E hit t δ > 0, to be finite. Choosing b ≤ −2 is recommended if the existence of moments of the volatility proxy or volatility forecasts is a concern.
5
Empirical application to forecasting IBM return volatility
In this section we consider the problem of forecasting the conditional variance of the daily return on IBM, using data from the TAQ database over the period from January 1993 to May 1998. This sample period was used in the Andersen, et al. (2001b) study of realised variance of equity returns. 18
We consider three simple volatility forecasts: those obtained from a 60-day rolling window variance, the RiskMetrics volatility model based on daily returns, and the RiskMetrics volatility model based on 15-minute realised variance: 1 X60 2 r j=1 t−j 60 2 “RiskMetrics1” : h2t = λh2t−1 + (1 − λ) rt−1 , λ = 0.94
Rolling window : h1t =
“RiskMetrics2” : h3t = λh3t−1 + (1 − λ) RVt−1 , λ = 0.94
(28) (29) (30)
We use the first year of observations (252 observations) to initiate the RiskMetrics forecasts, and the remaining 1112 observations to compare the forecasts. A plot of the three volatility forecasts is provided in Figure 5. We employ a variety of volatility proxies in the comparison of these forecasts: the daily squared return, and realised variance computed using 65-minute, 15-minute and 5-minute returns8 . We also considered using the intra-daily range but the conditions required for this to be a conditionally unbiased volatility proxy did not appear to hold for this series9 . In comparing these forecasts we present the results of DM tests using the loss function presented in Section 4, for five different choices of the loss function parameter: b = {1, 0, −1, −2, −5}. MSE loss and QLIKE loss correspond to b = 0 and b = −2 respectively. Recall from the previous section that different choices of b require weaker or stronger moment conditions for the DM test to be valid. For b = −5 we only require i h £ ¤ £ 12 ¤ < ∞ for δ > 0, whereas for b = 1 we need E h6it < ∞ and E σ ˆ t < ∞. These E σ ˆ 4+δ t
assumptions should be kept in mind when interpreting the results below.
Table 3 presents the results of standard Mincer-Zarnowitz tests of the three volatility forecasts. The rolling window and daily RiskMetrics forecasts are rejected using all four volatility proxies, with MZ test p-values equal to 0.00 in all cases. The RiskMetrics forecast based on 15-minute realised volatility passes the MZ test using daily squared returns as a volatility proxy (with a pvalue of 0.06) but fails the MZ test when less noisy volatility proxies are employed. We can thus conclude that none of these forecasts is optimal. This conclusion leads then to the question of relative forecast performance, for which we use the Diebold-Mariano test. 8
We use 65-minute returns rather than 60-minute returns so that there are an even number of intervals within the
NYSE trade day, which runs from 9.30am to 4pm. 9 The adjustment factor for the squared range, under the assumption that intra-daily prices follow a zero mean, constant volatility diffusion with no jumps is 1/ log (16) ≈ 0.36, as discussed in Section 3. The ratio of the sample average squared return to the sample average squared range was 0.52, and was significantly different from 1/ log (16) at the 0.05 level, indicating that the assumptions underlying the use of the range as a volatility proxy were not satisfied.
19
In Table 4 we present tests comparing the RiskMetrics forecasts based on daily returns with the 60-day rolling window volatility forecasts. The DM tests indicate important sensitivity to the loss function parameter: for b ≥ 0 the test statistics are negative in all but one case, indicating that the rolling window forecasts are prefered to the RiskMetrics forecasts; for b < 0 the test statistics are positive, in all but one case, indicating the opposite. In only a few cases are any of the test statistics in Table 3 greater than 1.96 in absolute value, and so overall we may conclude that these forecasts do not differ significantly in performance. In Table 5 we present tests comparing the RiskMetrics forecasts based on realised variance with the 60-day rolling window volatility forecasts. Here we find strong evidence that the RiskMetrics forecasts based on realised variance significantly out-performs the rolling window forecasts: all test statistics are positive, and all are significant at the 0.05 level. In Table 6 we also find strong evidence that RiskMetrics forecasts based on realised variance out-perform those based on daily returns. For all choices of b > −5 the test statistics are all positive and generally significant. For b = −5 the test statistics are still all positive, but none are significantly different from zero. In Figure 6 we present these results in graphical form: each line represents a sequence of DM t-statistics for a given pair of forecasts, for b ranging from −5 to 1. The critical values of ±1.96 are also given, though note that these are point-wise confidence intervals, and so they are only valid for a given b, not for a range of values of b.10 Variation in the loss function parameter leads to substantial variation in the DM t-statistics in some cases. For example, for the comparison of the rolling window forecasts with the RiskMetrics forecasts based on daily returns using 15-minute realised variance as the volatility proxy (“rw1 vs rm1” in the top panel of Figure 6) the test statistic is greater than 1.96 for −4.79 ≤ b ≤ −2.03, is zero when b ≈ −0.80 , and is approaches -1.96 as b gets near 1 (for b = 1 the t-statistic is −1.93). Thus depending on the preferences of the forecast user (ie, the choice of b) the user may find the first forecast significantly better, (almost) significantly worse, or not different from the second forecast. Contrast this outcome with the sequence of tests comparing rolling window forecasts with the RiskMetrics forecasts based on 15-minute realised variance, using 15-minute realised variance as the proxy: in this case the line “rw1 vs rm2” is uniformly above than 1.96, indicating that all forecast users with preference parameter b between −5 and 1 would find the RiskMetrics forecasts based on 15-minute realised variance significantly better than the rolling window forecast. 10
The development of tests that apply across a range of values of b is currently being pursued in a separate paper.
20
6
Conclusion
We have analytically demonstrated some problems with volatility forecast evaluation and comparison techniques used in the literature. These techniques invariably rely on a volatility proxy, which is some imperfect estimator of the true conditional variance. The presence of noise in the volatility proxy can lead to the true conditional variance being rejected as sub-optimal, and an imperfect volatility forecast being selected over the true conditional variance. We also showed analytically that less noisy volatility proxies, such as the intra-daily range and realised volatility, lead to less distortion, though in some cases the degree of distortion is still large. The distortions in the results of forecast comparison tests follow from the use of loss functions that are not consistent with the maintained goal of forecasting the conditional variance, when used in conjunction with an imperfect volatility proxy. We derive necessary and sufficient conditions on the loss function for it to be suitable for volatility forecast comparison in such situations. We also propose a new parametric family of loss functions appropriate for volatility forecast comparison using an imperfect volatility proxy, and derived the moment conditions necessary for the use of this loss function in forecast comparison tests. The new family of loss function nests both squarederror and the “QLIKE” loss functions, two of the most widely-used in the volatility forecasting literature. The family also includes loss functions that require weaker moment conditions than some standard loss functions, by placing less weight on large observations. A small empirical study of IBM equity volatility illustrated the new loss functions in forecast comparison tests and found that a RiskMetrics-style forecast based on realised volatility is preferred across a wide range of loss functions to a RiskMetrics forecast based on daily returns and a rolling window volatility forecast.
21
7
Appendix 1: Supporting calculations for Section 2
Optimal forecasts under alternative loss functions:
MSE-prop: £ ¤ Et−1 rt4 £ ¤ = Kurtosist−1 [rt ] σ 2t = Et−1 rt2 ⎧ ³ ´ ¡ ¢ ⎨ 3 ν−2 σ 2 , if rt |Ft−1 ∼ Student’s t 0, σ 2t , ν , ν > 4 t ν−4 = ¢ ¡ ⎩ 3σ 2 , if r |F ∼ N 0, σ 2
h∗t
t
t
t−1
t
£ ¤ QLIKE: h∗t = Et−1 rt2 = σ 2t MAE-SD:
£ ¤ h∗t = (M ediant−1 [|rt |])2 = σ 2t (M ediant−1 [|εt |])2 = σ 2t M ediant−1 ε2t £
2
¤
since M edian [X] = M edian X 2 for any non-negative random variable X . Thus the optimal forecast is identical to that under MAE loss, which is given in the body of the paper.
¡ ¢ MAE-prop: If rt2 |Ft−1 ∼ Ft σ 2t and ε2t ≡ rt2 /σ 2t |Ft−1 ∼ Gt (1) then Z h∗t 2 Z ∞ 2 rt ¡ 2 ¢ 2 rt ¡ 2 ¢ 2 r dr F OC 0 = f − t t t ∗ ∗ ft rt drt ht 0 h∗t ht Z h∗t 2 Z ∞ 2 rt ¡ 2 ¢ 2 rt ¡ 2 ¢ 2 r dr so f = t t t ∗ ft rt drt h∗t h∗t ht 0 ∙ 2¯ ¸ ∙ 2¯ ¸ rt ¯¯ 2 rt ¯¯ 2 ∗ ∗ ∗ ∗ r ≤ ht r > ht Ft (ht ) Et−1 = (1 − Ft (ht )) Et−1 h∗ ¯ t h∗ ¯ t t
t
without loss of generality let h∗t ≡ σ 2t γ ∗t , γ ∗t > 0, so
¯ ¸ ∙ 2¯ ¸ ¡ 2 ∗ ¢¢ ¡ ε2t ¯¯ 2 εt ¯¯ 2 ∗ ∗ ε ε Et−1 σ E ≤ γ γ > γ = 1 − F t t−1 t t t t γ ∗t ¯ t γ ∗t ¯ t ¯ ¯ £ ¤ £ ¤ Gt (γ ∗t ) Et−1 ε2t ¯ ε2t ≤ γ ∗t = (1 − Gt (γ ∗t )) Et−1 ε2t ¯ ε2t > γ ∗t
¡
Ft σ 2t γ ∗t
¢
∙
If ε2t |Ft−1 ∼ G (1), then γ ∗t = γ ∗ ∀ t. Finding an explicit expression for h∗t is difficult, and so we used 10,000 simulated draws for ν = {4, 6, 10, 20, 30, 50, 100, 1000, ∞} and numerically obtained h∗t for each
ν. We then used OLS to find the approximation given in Table 1, which yielded an R2 of 0.9667. Diebold-Mariano test using MSE-SD loss: We have ´2 ³ p dt = (|rt | − σ t )2 − |rt | − 2/πσ t 22
and we seek to find an expression for DM0 as a function of (ω, α, β, n), where DM0 = V
¤−1/2 √ £√ nd¯n nE [dt ].
In the interests of parsimony we present results under the incorrect assumption that dt is serially uncorrelated, which leads to the simplification DM0 = V [dt ]
−1/2 √
nE [dt ] . In unreported work we also derived
the variance allowing for serial correlation in dt and found that accounting for the serial correlation does not change the conclusion significantly. The serial correlation in dt turns out to be negative, and so the correct variance is slightly smaller than the naïve variance estimator used, which makes the DM0 statistic diverge even faster.
´2 ³ p dt = (|rt | − σ t )2 − |rt | − 2/πσ t ! Ãr ¶ µ 2 2 2 σt + 2 − 1 |εt | σ 2t = 1− π π Ã r !2 £ ¤ 2 so E [dt ] = 1− E σ 2t π and
£
⎡ ⎡Ã !!2 ⎤⎤ Ãr £ 2¤ 2 2 ⎦⎦ −1 E dt = E ⎣σ 4t Et−1 ⎣ 1 − + 2 |εt | π π ! Ã r r £ ¤ 2 12 8 2 12 + + − 2 E σ 4t = 5 − 12 π π π π π
¤
£
¤
The quantities E σ 2t and E σ 4t depend on the DGP for the returns, and in this case they equal:
£ ¤ E σ 2t =
£ ¤ E σ 4t = so
DM0 =
=
ω , if α + β < 1 1−α−β ³ ´ ω 2 (1 + α + β) ³ ´ , if 1 − (α + β)2 − 2α2 > 0 1 − (α + β)2 − 2α2 (1 − α − β)
r³ q 5 − 12 π2 +
12 π
r³ q 5 − 12 π2 +
12 π
q ´2 £ ¤ √ ³ n 1 − π2 E σ 2t q ´4 £ ¤ q ´ £ ¤ ³ 2 + π8 π2 − π122 E σ 4t − 1 − π2 E σ 2t q ´2 √ ³ n 1 − π2 q ´4 q ´ ³ 1−(α+β)2 2 + π8 π2 − π122 1−(α+β) − 1 − 2 π −2α2
as stated in the text. Note that the parameter ω does not affect the statistic.
23
8
Appendix 2: Calculations supporting Section 3
Wherever possible we derived solutions or approximate solutions analytically. This was not always possible and so in some cases we had to resort to simulations to obtain solutions. Feller (1951) presents the density of the range:
f (RGt ; σ t ) = 8
∞ X
k−1
(−1)
k=1
¶ µ k2 k · RGt φ σt σt
where φ is the standard normal pdf . For practical purposes the sum in the above expression needs to be truncated at some finite value; we truncate at k = 1000. Parkinson (1980) presented the cdf of the range, and a formula for obtaining moments:
¶ µ ¶ µ ¶¾ ½ µ k · RGt (k − 1) RGt (k + 1) RGt √ √ √ − 2erfc + erfc F (RGt ; σ t ) = (−1) k erfc σ 2 σ 2 σ 2 k=1 ¶ µ ´ 4 p + 1 ³ p/2 E [RGpt ] = √ Γ 2 − 22−p/2 ζ (p − 1) σ pt , for p ≥ 1 2 π ∞ X
k−1
√ R∞
2
where erfc(x) ≡ 1− erf(x), erf(x) is the ‘error function’: erf (x) ≡ 2/ π 0 e−t dt, and ζ is the Riemann zeta function. From this expression we can obtain the necessary moments for computing optimal forecasts when the range is used as a volatility proxy. For the first and second moments of RGt we can obtain simple −3 which is an irrational number, and thus only expressions, but the fourth moment involves ζ (3) = Σ∞ k=1 k
a numerical expression is available. In addition to the moments of RGt , we will need the mean of log RGt and the median of RGt . We used quadrature and OLS to obtain the expression11 :
Et−1 [log RGt ] = 0.4257 + log σ t
(31)
which is consistent with the expression given in Alizadeh, et al. (2002). We numerically inverted the cdf of the range, given in Parkinson (1980), and used OLS to determine the following relation12 :
M ediant−1 [RGt ] = 1.5145σ t £ ¤ so M ediant−1 RG2t = 2.2938σ 2t , since RGt is weakly positive. © £ ¤ª MSE-LOG: h∗t = exp Et−1 log σ ˆ 2t . A Taylor series approximation did not provide a good fit
when considering realised variance as a proxy, and so we resorted to simulations. We simulated 50,000 “days” 11
We used quadrature to estimate Et−1 [log RGt ] for σ t = 0.5, 1, 1.5, ..., 10. We then regressed these esti-
mates on a constant and log σ t to obtain the parameter estimates. The R2 from this regression was 1.0000. 12 The R2 from this relation for σ = 0.5, 1, 1.5, ..., 10 was 1.0000.
24
worth of observations,
where the number of observations per day considered was m
{1, 3, 5, 7, 10, 13, 20, 40, 60, 79, 100}.
The following expression yielded an R2
=
of 0.9959 : Et−1 [log RVt (m)] ≈
−1.2741/m, so the optimal forecast under our DGP assumption is h∗t ≈ σ 2t e−1.2741/m . For the range we find that
£ ¤ Et−1 log RG∗2 = 2Et−1 [log RG∗t ] t
= −0.1684 + log σ 2t
so h∗t = e−0.1684 σ 2t ≈ 0.8450σ 2t £ 2¤ MAE-SD: The optimal forecast is h∗t = M ediant−1 σ ˆ 2t is weakly positive we know that ˆ t . Since σ £ 2¤ M ediant−1 σ σ t ])2 , and so the results for this loss function are identical to those for ˆ t = (M ediant−1 [ˆ the MAE loss function.
£ 4¤ £ 2¤ MSE-prop: h∗t = Et−1 σ ˆ t /Et−1 σ ˆ t . When realised volatility is used as the proxy we find: h∗t = ´ ³ ¢ 2 ¡ ¢ 2 ¡1 m−1 2 ∗ = 10.8185/ (log 16)2 σ 2 ≈ Kurtosis [r ] + = 1 + . For the range we find that: h σ σ t−1 t,i t t t t m m m
1.4073σ 2t .
MAE-prop: For realised variance, like the daily squared return, obtaining an analytical, even approximate, solution to this problem is difficult and so we used simulations. In the set-up given in the text it is again possible to show that the optimal forecast is of the form h∗t = γ ∗ σ 2t . For realised volatilty we simulated 50,000 “days” worth of observations, where the number of observations per day considered was
m = {1, 3, 5, 7, 10, 13, 20, 40, 60, 79, 100}, and used numerical methods to locate the optimum forecast. ¢ 2 ¡ σ t . For the range we again used a The following expression yielded an R2 of 0.9999 : h∗t ≈ 1 + 1.3624 m
numerical minimisation algorithm combined with quadrature to compute the expectation in the optimisation problem: h∗t ≈ 0.9941σ 2t .
9
Appendix 3: Proofs of Propositions
Proof of Proposition 1. (i) Let us approximate the loss function L with a second-order Taylor series: ¢ ¡ ¢ ¡ ¡ 2 ¢ ∂L σ 2t , h ¡ 2 ¡ 2 ¢ ¢ 1 ∂ 2 L σ 2t , h ¡ 2 ¢2 2 L σ ˆ t , h ≈ L σt , h + σ ˆ t − σt + σ ˆ t − σ 2t ¡ 2 ¢2 2 2 ∂ σt ∂σ t ¡ ¢ £ ¡ 2 ¢¤ ¡ ¢ 1 ∂ 2 L σ 2t , h ˆt , h so Et−1 L σ ≈ L σ 2t , h + ¡ ¢ Vt 2m ∂ σ 2 2 t
25
by assumption T1. The first-order condition for forecast optimality is " ¡ 2 ∗¢ # ∂L σ ˆ t , ht FOC 0 = Et−1 ∂h ¢ ¡ ¡ 2 ∗¢ ∂L σ t , ht 1 ∂ 3 L σ 2t , h∗t + ≈ Vt ¡ ¢ ∂h 2m ∂ σ 2t 2 ∂h
In the absence of noise in the volatility proxy (i.e. Vt = 0) the second term above would equal zero and the first-order condition would be the same as if the true conditional variance was observable. By assumption T4 this yields h∗t,m = σ 2t . One of the conditions of Hansen and Lunde (2004) was ¡ ¢ ¡ ¢2 ∂ 3 L σ 2t , h∗t,m /∂ σ 2t ∂h = 0, which implies that the second term above equals zero even in the ¢ ¡ ¢2 ¡ presence of a noisy volatility proxy. For loss functions that yield ∂ 3 L σ 2t , h∗t,m /∂ σ 2t ∂h 6= 0 the
presence of noise in the volatility proxy distorts the first-order condition from what it would be in the absence of noise, and thus affects the optimal forecast. ¡ ¢ ¡ ¢2 ¡ ¢ If ∂ 3 L σ 2t , h /∂ σ 2t ∂h > 0 ∀ σ 2t , h , which is true for the MSE-log and MSE-SD loss ¢ ¡ functions for example, then the FOC implies that we must have ∂L σ 2t , h∗t,m /∂h < 0, which
implies that h∗t,m < σ 2t , by assumption T4. This implication is consistent with the results presented
in Table 2 for the MSE-log and MSE-SD loss functions. Alternatively, consider the case that ¡ ¢ ¡ ¢2 ¡ ¢ ∂ 3 L σ 2t , h /∂ σ 2t ∂h < 0 ∀ σ 2t , h , which is true for the MSE-prop loss function for example. In ¢ ¡ that case the first-order condition implies that we must have ∂L σ 2t , h∗t,m /∂h > 0, which implies
that h∗t,m > σ 2t , by assumption T4. This implication is also consistent with the results presented in Table 2 for the MSE-prop loss function. (ii) Follows from Theorem 3.4 of White (1994), noting that assumptions T3 and T4 imply that ¡ ¢ h∗ = σ 2 is the unique solution to the problem minh∈H L σ 2 , h . Proof of Proposition 2. We first prove the sufficiency part of the proposition. The first-order
condition is £ ¡ 2 ∗ ¢¤¢ ∂ ¡ ˆ t , ht Et−1 L σ ∂h ³ ¡ £ ¡ 2 ¢¤ £ 2¤ ¢´ ∂ ˜ ∗ ∗ ∗ = ˆ t + C (ht ) Et−1 σ ˆ t − ht C (ht ) + Et−1 B σ ∂h ¡ £ ¤ ¢ = C 0 (h∗t ) Et−1 σ ˆ 2t − h∗t
0 =
£ 2¤ ˆ t since C is a strictly decreasing function. The second-order condition which implies h∗t = Et−1 σ ¡ ¡ £ ¡ 2 ∗ ¢¤¢ £ 2¤ ¢ ˆ t , ht /∂h2 = C 00 (h∗t ) Et−1 σ ˆ t − h∗t − C 0 (h∗t ) = −C 0 (h∗t ) > 0, is also satisfied: ∂ 2 Et−1 L σ £ 2¤ ˆ t and C is strictly decreasing. That the ranking of competing forecasts obtained since h∗t = Et−1 σ 26
using σ ˆ 2t is the same as that obtained using σ 2t follows from Hansen and Lunde (2004): their assump¡ 2 ¢ ¡ 2 ¢2 ˆ , h /∂ σ ˆ = tion 2 is satisfied given the assumptions for the proposition and noting that ∂ 2 L σ ¡ ¢ ˆ 2 does not depend on h. B 00 σ The necessity part of the proposition is more challenging. For this part we follow closely the
proof of Theorem 1 of Komunjer and Vuong (2004), adapted to our problem. We seek to show that £ 2¤ ˆt , the functional form of the loss function given in the proposition is necessary for h∗t = Et−1 σ for any Ft ∈ F˜ . Notice that we can write ¡ 2 ¢ ¡ 2 ¢¡ 2 ¢ ∂L σ ˆ t , ht =c σ ˆ t , ht σ ˆ t − ht ∂h ¡ 2 ¢ ¡ 2 ¡ 2 ¢ ¢−1 where c σ ˆ t , ht = σ ∂L σ ˆ t , ht /∂h, since σ ˆ t − ht ˆ 2t 6= ht a.s. by assumption A2. Now decom¡ 2 ¢ pose c σ ˆ t , ht into £ ¡ 2 ¢¤ ¡ 2 ¢ ˆ t , ht + εt c σ ˆ t , ht = Et−1 c σ
where Et−1 [εt ] = 0. Thus " ¡ 2 ∗¢ # £ ¡ 2 ∗¢ ¡ 2 ¢¤ ∂L σ ˆ t , ht Et−1 ˆ t , ht σ ˆ t − h∗t = Et−1 c σ ∂h £ ¡ 2 ¢¤ £ 2 ¤ £ ¡ 2 ¢¤ ˆ t , ht Et−1 σ ˆ t − h∗t + Et−1 εt σ ˆ t − h∗t = Et−1 c σ £ ¡ 2 ∗¢ ¤ £ 2¤ £ 2 ¤ If Et−1 ∂L σ ˆ t , ht /∂h = 0 for h∗t = Et−1 σ ˆ t , then it must be that Et−1 σ ˆ t − h∗t = 0 ⇒ £ ¡ 2 ¢¤ ˆ t − h∗t = 0 for all Ft ∈ F˜ . Employing a generalised Farkas lemma, see Lemma 8.1 of Et−1 εt σ ¢ ¡ 2 ¢ ¡ 2 ˆ t − h∗t for Gourieroux and Monfort (1996), this implies that ∃ λ ∈ R such that λ σ ˆ t − h∗t = εt σ every Ft ∈ F˜ and for all t. Since σ ˆ 2t − h∗t 6= 0 a.s. by assumption A2 this implies that εt = λ a.s. £ ¡ 2 ∗ ¢¤ ¡ 2 ∗¢ ˆ t , ht for all t, which ˆ t , ht = Et−1 c σ for all t. Since Et−1 [εt ] = 0 we then have λ = 0. Thus c σ ¡ 2 ∗¢ ¡ ¡ ¢ ¢ implies that c σ ˆ t , ht = c (h∗t ), and thus that ∂L σ ˆ 2t , ht /∂h = c (ht ) σ ˆ 2t − ht . ¡ 2 ∗¢ £ ¡ 2 ¢¤ £ ¤ ˆ t , ht /∂h2 ≥ 0, ˆ t , h is that Et−1 ∂ 2 L σ A necessary condition for h∗t to minimise Et−1 L σ using A5 to interchange expectation and differentiation. Using the previous result we have: " ¡ 2 ∗¢ # ¤ £ 0 ∗ ¡ 2 ¢ ˆ t , ht ∂2L σ ∗ ∗ (h ) σ ˆ − h ) = −c (h∗t ) = E c − c (h Et−1 t−1 t t t t ∂h2
which is non-negative iff c (h∗t ) is non-positive. From assumption A4 we know that the optimum is in the interior of H and so we know that c 6= 0, and thus c (h) < 0 ∀ h ∈ H. To obtain the loss function corresponding to the given first derivative we simply integrate up: Z Z ¡ 2 ¢ ˆ 2 c (h) dh − c (h) hdh L σ ˆ ,h = σ µ ¶ Z ¡ 2¢ 2 = B σ ˆ +σ ˆ C (h) − C (h) h − C (h) dh ¡ 2 ¢ ¡ 2¢ ˆ −h = C˜ (h) + B σ ˆ + C (h) σ 27
where C is a strictly decreasing function (ie C 0 ≡ c is negative) and C˜ is the anti-derivative of C. By assumption A3 both B and C are twice continuously differentiable. Thus L being of the £ 2¤ ˆt . conjectured form is a necessary and sufficient condition for h∗t = Et−1 σ Proof of Proposition 3. In the following we drop time subscripts where they are not needed.
Consider b = −2 first: L (h, h) = B (h) + 1 + log h = 0 so B (h) = −1 − log σ ˆ2 ¡ 2 ¢ σ ˆ2 σ ˆ2 − log −1 and L σ ˆ ,h = h h
Thus the loss function for b = −2 takes the proposed form. The QLIKE function is corresponds ¡ ¢ to this loss function, up to an additive constant −1 − log σ ˆ 2 . This loss function is of the form in ¡ 2¢ ˆ 2 , C (h) = h−1 and C˜ (h) = log h. Proposition 2 with B σ ˆ = − log σ Now consider b = −1:
L (h, h) = B (h) − h log h + h = 0 so B (h) = h log h − h ¡ 2 ¢ σ ˆ2 and L σ ˆ ,h = h − σ ˆ2 + σ ˆ 2 log h
So the loss function for b = −1 takes the proposed form. This loss function is of the form in ¡ 2¢ ˆ2 − σ ˆ 2 , C (h) = − log h and C˜ (h) = h − h log h. Proposition 2 with B σ ˆ =σ ˆ 2 log σ Finally, let us examine the loss functions for b ∈ / {−1, −2}
1 b+2 1 h hb+2 = 0 + b+1 b+2 1 b+2 1 b+2 1 h h hb+2 − = b+1 b+2 (b + 1) (b + 2) 1 1 1 b+2 σ ˆ 2b+4 − σ ˆ 2 hb+1 + h (b + 1) (b + 2) b+1 b+2 ´ ³ ¢ 1 1 b+1 ¡ 2 h σ ˆ −h σ ˆ 2b+4 − hb+2 − (b + 1) (b + 2) b+1
L (h, h) = B (h) − so B (h) = ¡ 2 ¢ and L σ ˆ ,h = =
¡ 2¢ which is of the proposed form. This loss function is of the form in Proposition 2 with B σ ˆ = 1 1 1 ˆ 2b+4 , C (h) = − b+1 hb+1 and C˜ (h) = − (b+1)(b+2) hb+2 . (b+1)(b+2) σ Proof of Proposition 4. (i) Follows directly from Exercise 5.21 of White (1999).
28
(ii) For b ∈ / {−1, −2} we have:
´ ³ ´ 1 1 ³ b+2 b+1 b+1 2 − h h1t − hb+2 σ ˆ − h 2t 2t b+2 b + 1 t 1t ³ ´ 1 b+2 h2b+4 = + h2b+4 − 2hb+2 1t 2t 1t h2t (b + 2)2 ³ ´ 1 2b+2 2b+2 b+1 b+1 4 h σ ˆ + h − 2h h + t 1t 2t 1t 2t (b + 1)2 ³ ´ 2 b+1 b+1 b+2 σ ˆ 2t h2b+3 − + h2b+3 − hb+2 1t 2t 1t h2t − h1t h2t (b + 1) (b + 2)
dt (b) = so dt (b)2
The largest terms in this expression are: (a) h2b+4 , (b) σ ˆ 4t h2b+2 and (c) σ ˆ 2t h2b+3 for i = 1, 2 . The it it it expectation of first term is finite by assumption. (b) If b > −1, then by Hölder’s inequality: h i h¡ ¢ i1/(b+2) ∙³ ´(b+2)/(b+1) ¸(b+1)/(b+2) 2b+2 4 2b+2 4 b+2 E σ ˆ t hit E hit ≤ E σ ˆt i1/(b+2) h i(b+1)/(b+2) h 2b+4 = E σ ˆ 4b+8 E h t it < ∞ by assumption.
If b < −1, then we will make use of the assumption that inf t hit ≡ ci > 0 for i = 1, 2. This −1 assumption implies that h−1 it is bounded below by zero and above by ci , and thus all moments of
h−1 it exist. h i h¡ ¢ i4/(4+δ) ∙³ ´(4+δ)/δ ¸δ/(4+δ) 2b+2 4 2b+2 4 (4+δ)/4 E σ ˆ t hit ≤ E σ ˆt E hit , for δ > 0 i4/(4+δ) h i h 2(b+1)(4+δ)/δ δ/(4+δ) = E σ ˆ 4+δ E h t it
i h i h −M < ∞ by assumption, and E h < ∞ for all M < ∞ since h−1 which is finite as E σ ˆ 4+δ t it it is a
bounded random variable. (c) If b > −1.5, then
i h i1/(2b+4) ∙³ ´(2b+4)/(2b+3) ¸(2b+3)/(2b+4) h 4b+8 2b+3 h ≤ E σ ˆ E E σ ˆ 2t h2b+3 t it it i1/(2b+4) h i(2b+3)/(2b+4) h 2b+4 = E σ ˆ 4b+8 E h t it < ∞ by assumption.
If b < −1.5,
i h¡ ¢ i2/(4+δ) ∙³ ´(4+δ)/(2+δ) ¸(2+δ)/(4+δ) h 2b+3 2 2b+3 2 (4+δ)/2 ≤ E σ ˆt E hit E σ ˆ t hit i2/(4+δ) h i h (2b+3)(4+δ)/(2+δ) (2+δ)/(4+δ) = E σ ˆ 4+δ E h t it 29
i h i h −M < ∞ by assumption, and E h < ∞ for all M < ∞ since h−1 which is finite as E σ ˆ 4+δ t it it is a
bounded random variable.
Now consider b = −1. Here we have h1t h2t 2 4 = (h1t − h2t ) + σ ˆ t (log (h1t ) − log (h2t ))2
ˆ 2t log dt = h1t − h2t − σ so d2t
−2ˆ σ 2t (log (h1t ) − log (h2t )) (h1t − h2t ) ˆ 4t (log hit )2 and (c) σ ˆ 2t hit log hit . The first The largest terms in this expression are: (a) h2it , (b) σ term is finite by assumption. For (b) we have i h¡ ¢ i i1/(e+1) h h (e+1)/e e/(e+1) ˆ 4t E (log hit )2(e+1) E σ ˆ 4t (log hit )2 ≤ E σ h h i i1/(e+1) 4(e+1)/e e/(e+1) 2(e+1) = E σ ˆt E (log hit ) The first term on the right-hand side is finite by assumption. For the second term note: (log h)k ≤ hk/e ∀ h ≥ 1 for k > 0 Z ∞ i h k ≡ And E (log hit ) (log hit )k fit (hit ) dhit ci 1
=
Z
k
(log hit ) fit (hit ) dhit +
ci
¯Z 1 ¯ ¯ ¯ k ¯ (log hit ) fit (hit ) dhit ¯¯ < ∞ for k > 0 since ci > 0 ¯ Z ∞ Z ci∞ k/e (log hit )k fit (hit ) dhit ≤ hit fit (hit ) dhit 1 Z1 ∞ k/e ≤ hit fit (hit ) dhit ci i h k/e ≡ E hit
Z
∞
(log hit )k fit (hit ) dhit
1
h i h i 2(e+1)/e So E (log hit )2(e+1) < ∞ if E hit < ∞, which holds by assumption. For (c) we have
h¡ ¢ i i i1/(2e+1) h h £ 2 ¤ (2e+1)/e e/(2e+1) (2e+1)/e e/(2e+1) E σ ˆ t hit log hit ≤ E σ ˆ 2t E hit E (log hit )2e+1 < ∞
As the first two terms on the right-hand side are finite by assumption, and the final term is finite given that the second term is finite and ci > 0.
30
Finally consider b = −2. We have µ ¶ 1 h1t 1 2 dt = σ + log ˆt − h1t h2t h2t µ ¶2 1 1 so d2t = σ ˆ 4t − + 2 (log h1t − log h2t )2 h1t h2t µ ¶ µ ¶ 1 1 1 1 2 2 log h1t − 2ˆ log h2t +2ˆ σt − σt − h1t h2t h1t h2t 2 with largest terms: (a) σ ˆ 4t h−2 ˆ 2t h−1 ˆ 2t h−1 it , (b) (log hit ) , (c) σ it log hit , and (d) σ jt log hit for i = 1, 2 and
j 6= i. For term (a) we have
"µ ¶ #δ/(4+δ) ∙ ¸ h¡ ¢ i4/(4+δ) 1 (4+δ)/δ 4 1 4 (4+δ)/4 E σ ˆt 2 ≤ E σ ˆt E , for δ > 0 hit h2it i4/(4+δ) h i h −2(4+δ)/δ δ/(4+δ) E h = E σ ˆ 4+δ t it
The first term on the right-hand side is finite by assumption and the second term is finite since i h i h k/e ci > 0. For term (b) recall from above that if ci > 0 then E (log hit )k < ∞ if E hit < ∞ for i h i h 2/e k > 0. So E (log hit )2 < ∞ is finite if E hit < ∞, which holds by assumption. For term (c) we use
i h¡ h¡ ¢ ¤ ¢(4+δ)/(2+δ) i(2+δ)/(4+δ) £ 2 −1 (4+δ)/2 2/(4+δ) E h−1 log h for δ > 0 ˆ 2t E σ ˆ t hit log hit ≤ E σ it it h i2/(4+δ) h¡ ¢(4+δ)/(2+δ) i(2+δ)/(4+δ) −1 = E σ ˆ 4+δ E h log h it t it
which is finite as the first term is finite by assumption and the second term is finite since h−1 it log hit is a bounded random variable if ci > 0. For the term in (d) we have i h¡ h ¢(1+δ/4) i1/(1+δ/4) h −(4+δ)/δ iδ/(4+δ) 2 ≤ E σ ˆ log h log h E hjt , for δ > 0 E σ ˆ 2t h−1 it it t jt à ∙ ! ³ ´ ¸1/2 ∙³ ´2 ¸1/2 1/(1+δ/4) h i 2(1+δ/4) 2 −(4+δ)/δ δ/(4+δ) (1+δ/4) ≤ E σ ˆt E (log hit ) E hjt µ h i1/2 h i1/2 ¶1/(1+δ/4) h i −(4+δ)/δ δ/(4+δ) 2+δ/2 4+δ = E σ ˆt E (log hit ) E hjt
The first term is finite by assumption, and the third term is finite as ci > 0. The second term i h i h i h (2+δ/2)/e (2+δ/2)/e 2/e+δ is finite, and E hit < E hit , which is finite by assumption. is finite if E hit
This completes the proof.
31
10
Tables and Figures
Table 1: Optimal forecasts under various loss functions Optimal forecast, h∗t Loss function
¡ ¢ rt |Ft−1 ∼ Student0 s t 0, σ 2t , ν ν=6
ν = 10
MSE
¡ ¢ rt |Ft−1 ∼ Ft 0, σ 2t
σ 2t
σ 2t
σ 2t
σ 2t
σ 2t
QLIKE
σ 2t
σ 2t
σ 2t
σ 2t
σ 2t
0.22σ 2t
0.25σ 2t
0.28σ 2t
0.56σ 2t
0.60σ 2t
0.64σ 2t
2 3 ν−2 ν−4 σ t
6.00σ 2t
4.00σ 2t
3.00σ 2t
ν−2 2 ν M edian [F1,ν ] σ t ν−2 2 ν M edian [F1,ν ] σ t ¢ ¡ 7.78 2 2.36 + 1.00 ν + ν2 σt
0.34σ 2t
0.39σ 2t
0.45σ 2t
0.34σ 2t
0.39σ 2t
0.45σ 2t
2.73σ 2t
2.55σ 2t
2.36σ 2t
MSE-LOG MSE-SD MSE-prop MAE MAE-SD MAE-prop †
£ ¤ª © exp Et−1 log ε2t σ 2t (Et−1 [|εt |])2 σ 2t
Kurtosist−1 [rt ] σ 2t £ ¤ M ediant−1 rt2 £ ¤ M ediant−1 rt2 n/a
ν
¡ ¢ª © ¡ ¢ exp Ψ 12 − Ψ ν2 (ν − 2) σ 2t ¡ ¡ ν−1 ¢ ¡ ν ¢¢2 2 ν−2 σt Γ 2 /Γ 2 π
ν→∞
Notes: This table presents the forecast that minimises the conditional expected loss when the squared
£ ¡
¢¤
return is used as a volatility proxy. That is, h∗t minimises Et−1 L rt2 , h , for various loss functions L. The second column presents the solutions when returns have an arbitrary conditional distribution Ft with mean zero and conditional variance σ 2t , the third, fourth and fifth columns present results with returns have the standardised Student’s t distribution, and the final column presents the solutions when returns are conditionally normally distributed. † The expressions given for MAE-prop are based on a numerical approximation, see Appendix 1 for details.
32
Table 2: Optimal forecasts under various loss functions, using realised volatility and range Volatility proxy Realised volatility
Loss function Range
Arbitrary m
m=1
m = 12
m = 78
m→∞
MSE
σ 2t
σ 2t
σ 2t
σ 2t
σ 2t
QLIKE
σ 2t
σ 2t
σ 2t
σ 2t
σ 2t
σ 2t
MSE-LOG ∗
0.85σ 2t
0.28σ 2t
0.90σ 2t
0.98σ 2t
σ 2t
MSE-SD
0.92σ 2t
0.56σ 2t
0.96σ 2t
0.99σ 2t
σ 2t
MSE-prop
1.41σ 2t
3.00σ 2t
1.17σ 2t
1.03σ 2t
σ 2t
MAE
0.83σ 2t
0.45σ 2t
0.95σ 2t
0.99σ 2t
σ 2t
MAE-SD
0.83σ 2t
0.45σ 2t
0.95σ 2t
0.99σ 2t
σ 2t
MAE-prop ∗
0.99σ 2t
e−1.2741/m σ 2t ³ hp i´2 1 χ2m σ 2t m E ¡ ¢ 2 1+ m σ 2t £ 2¤ 2 1 m M edian χm σ t £ 2¤ 2 1 m M edian χm σ t ¡ ¢ 2 1 + 1.3624 σt m
2.36σ 2t
1.11σ 2t
1.02σ 2t
σ 2t
σ 2t
Notes: This table presents the forecast that minimises the conditional expected loss when the range or
£ ¡
¢¤
realised volatility is used as a volatility proxy. That is, h∗t minimises Et−1 L σ ˆ 2t = RG∗2 ˆ 2t , h , for σ t or
σ ˆ 2t = RVt , for various loss functions L. In all cases returns are assumed to be generated as a zero mean Brownian motion with constant volatility within each trade day and no jumps. The cases of m = 1, 12, 78 correspond to the use of daily squared returns, realised variance with 30-minute returns and realised variance with 5-minute returns respectively. The case that m → ∞ corresponds to the case where the conditional
variance is observable ex-post without error. ∗ For the MSE-LOG and MAE-prop loss functions we used
simluations, numerical integration and numerical optimisation to obtain the expressions given. Details on the computation of the figures in this table are given in Appendix 2.
33
Table 3: Mincer-Zarnowitz tests of the volatility forecasts
Volatility Model ˆ
β0 (s.e.) Rolling window
RiskMetrics
(0.39)
0.26∗ (0.12) 38.46∗ 0.00
0.24∗ (0.10) 59.07∗ 0.00
(0.10) 57.09∗ 0.00
2.50∗
2.12∗
2.12∗
2.29∗
(s.e.)
(0.65)
ˆ β 1
0.34∗ (0.18) 15.42∗ 0.00
0.30∗ (0.12) 32.32∗ 0.00
0.25∗ (0.10) 60.00∗ 0.00
0.25∗
(0.08) 81.72∗ 0.00
0.95
1.25∗
1.23∗
1.27∗
0.87
0.62∗ (0.17) 7.01∗ 0.03
0.56∗ (0.14) 9.90∗ 0.01
0.61∗
(s.e.) χ22 -stat pval
(s.e.) χ22 -stat pval
ˆ
β0 (s.e.) 15-min RV RiskMetrics
5-min realised vol 2.22∗
(0.44)
ˆ β 1
ˆ β 1 (s.e.) χ22 -stat pval
(0.83)
Volatility proxy 65-min 15-min realised vol realised vol 2.25∗ 2.15∗
0.21∗ (0.20) 15.94∗ 0.00
ˆ β 0 Daily
Daily squared return 2.98∗
(0.41)
(0.70)
(0.49)
(0.26)
5.61 0.06
(0.35)
(0.40)
(0.39)
0.27∗
(0.32)
(0.35) (0.12)
12.92∗ 0.00
Notes: This table presents the results of Mincer-Zarnowitz (MZ) tests of three IBM equity volatility forecasts: a 60-day rolling window forecast, a RiskMetrics forecast based on daily returns, and a RiskMetrics forecast based on 15-minute realised volatility. The sample period is January 1994 to May 1998. The null hypothesis in the MZ test is that β 0 = 0 and β 1 = 1. We present the parameter estimates and Newey-West standard errors, and mark any parameter estimates that are different from their hypothesised values with an asterisk. We also present the results of a χ22 test of the joint parameter restriction and the p-value associated with the joint test statistic. A p-value of less than 0.05 indicates a rejection of the null, and thus evidence against the optimality of the volatility forecast. These statistics are marked with an asterisk.
34
Table 4: Comparison of rolling window and daily RiskMetrics forecasts
Loss function b=1 b = 0 (MSE) b = -1 b = -2 (QLIKE) b = -5
Daily squared return -0.93 0.01 1.08 1.67 1.03
Volatility proxy 65-min 15-min realised vol realised vol -1.52 -1.93 -0.68 -1.14 0.69 0.31 1.93 2.16∗ 1.19 1.83
5-min realised vol -2.21∗ -1.45 -0.02 1.51 1.97∗
Notes: This table presents the t-statistics from Diebold-Mariano tests of equal predictive accuaracy for a 60-day rolling window forecast and a RiskMetrics forecast based on daily returns for IBM over the period January 1994 to May 1998. A t-statistic greater than 1.96 in absolute value indicates a rejection of the null of equal predictive accuracy at the 0.05 level. These statistics are marked with an asterisk. The sign of the t-statistics indicates which forecast performed better for each loss function: a positive t-statistic indicates that the rolling window forecast produced larger average loss than the RiskMetrics forecast, while a negative sign indicates the opposite.
Table 5: Comparison of rolling window and 15-minute RiskMetrics forecasts
Loss function b=1 b = 0 (MSE) b = -1 b = -2 (QLIKE) b = -5
Daily squared return 2.45∗ 2.75∗ 2.84∗ 2.75∗ 2.35∗
Volatility proxy 65-min 15-min realised vol realised vol 3.19∗ 3.66∗ ∗ 3.51 4.02∗ 3.68∗ 4.19∗ ∗ 3.50 3.89∗ ∗ 2.51 2.33∗
5-min realised vol 3.46∗ 3.76∗ 3.87∗ 3.60∗ 2.34∗
Notes: This table presents the t-statistics from Diebold-Mariano tests of equal predictive accuaracy for a 60-day rolling window forecast and a RiskMetrics forecast based on 15-minute realised volatility for IBM over the period January 1994 to May 1998. A t-statistic greater than 1.96 in absolute value indicates a rejection of the null of equal predictive accuracy at the 0.05 level. These statistics are marked with an asterisk. The sign of the t-statistics indicates which forecast performed better for each loss function: a positive t-statistic indicates that the rolling window forecast produced larger average loss than the RiskMetrics forecast, while a negative sign indicates the opposite.
35
Table 6: Comparison of daily RiskMetrics and 15-minute RiskMetrics forecasts
Loss function b=1 b = 0 (MSE) b = -1 b = -2 (QLIKE) b = -5
Daily squared return 1.98∗ 1.81 1.83 2.01∗ 1.48
Volatility proxy 65-min 15-min realised vol realised vol 2.69∗ 3.38∗ ∗ 2.62 3.57∗ 2.34∗ 3.46∗ 1.66 2.73∗ 0.69 0.44
5-min realised vol 3.64∗ 3.84∗ 3.66∗ 2.87∗ 0.59
Notes: This table presents the t-statistics from Diebold-Mariano tests of equal predictive accuaracy for a RiskMetrics forecast based on daily returns and a RiskMetrics forecast based on 15-minute realised volatility for IBM over the period January 1994 to May 1998. A t-statistic greater than 1.96 in absolute value indicates a rejection of the null of equal predictive accuracy at the 0.05 level. These statistics are marked with an asterisk. The sign of the t-statistics indicates which forecast performed better for each loss function: a positive t-statistic indicates that the RiskMetrics forecast based on daily returns produced larger average loss than the RiskMetrics forecast based in 15-minute returns, while a negative sign indicates the opposite.
References [1] Alizadeh, Sassan, Brandt, Michael W., and Diebold, Francis X., 2002, Range-Based Estimation of Stochastic Volatility Models, Journal of Finance, 57(3), 1047-1091. [2] Andersen, Torben G., and Bollerslev, Tim, 1998, Answering the Skeptics: Yes, Standard Volatility Models Do Provide Accurate Forecasts, International Economic Review, 39, 885905. [3] Andersen, Torben G., Bollerslev, Tim, and Lange, Steve, 1999, Forecasting Financial Market Volatility: Sample Frequency Vis-à-vis Forecast Horizon, Journal of Empirical Finance, 6, 457-477. [4] Andersen, Torben, Bollerslev, Tim, Diebold, Francis X., and Ebens, Heiko, 2001a, The Distribution of Realized Stock Return Volatility, Journal of Financial Economics, 61, 43-76. [5] Andersen, Torben G., Bollerslev, Tim, Diebold, Francis X. and Labys, Paul, 2001b, The Distribution of Realized Exchange Rate Volatility, Journal of the American Statistical Association, 96, 42-55. [6] Andersen, Torben G., Bollerslev, Tim, Diebold, Francis X. and Labys, Paul, 2003, Modeling and Forecasting Realized Volatility, Econometrica, 71(2), 579-625. [7] Andersen, Torben G., Bollerslev, Tim, and Meddahi, Nour, 2005a, Correcting the Errors: Volatility Forecast Evaluation Using High-Frequency Data and Realized Volatilities, Econometrica, 73(1), 279-296.
36
[8] Andersen, Torben G., Bollerselv, Tim, Christoffersen, Peter F., and Diebold, Francis X., 2005b, Volatility and Correlation Forecasting, in the Handbook of Economic Forecasting, G. Elliott, C.W.J. Granger and A. Timmermann ed.s, North Holland Press, Amsterdam. [9] Ball, Clifford A., and Torous, Walter N., 1984, The Maximum Likelihood Estimation of Security Price Volatility: Theory, Evidence, and Application to Option Pricing, Journal of Business, 57(1), 97-112. [10] Bai, X., Russell, Jeffrey R., and Tiao, George C., 2001, Beyond Merton’s Utopia I: Effects of Dependence and Non-Normality on Variance Estimates Using High-Frequency Data, working paper, Graduate School of Business, University of Chicago. [11] Barndorff-Nielsen, Ole E., and Shephard, Neil, 2002, Econometric Analysis of Realised Volatility and Its Use in Estimating Stochastic Volatility Models, Journal of the Royal Statistical Society, Series B, 64, 253-280. [12] Barndorff-Nielsen, Ole E., and Shehard, Neil, 2004, Econometric Analysis of Realized Covariation: High Frequency Based Covariance, Regression and Correlation in Financial Economics, Econometrica, 72(3), 885-925. [13] Bauwens, Luc, Laurent, Sébastien, and Rombouts, Jeroen, 2003, Multivariate GARCH Models: A Survey, Journal of Applied Econometrics, forthcoming. [14] Bierens, Herman J., 1990, A Consistent Conditional Moment Test of Functional Form, Econometrica, 58, 1443-1458. [15] Bierens, Herman J., and Ploberger, Werner, 1997, Asymptotic Theory of Integrated Conditional Moment Tests, Econometrica, 65, 1129-1151. [16] Bollerslev, Tim, and Ghysels, Eric, 1994, Periodic Autoregressive Conditional Heteroscedasticity, Journal of Business and Economic Statistics, 14(2), 139-151. [17] Bollerslev, Tim, and Wright, Jonathan H., 2001, High-Frequency Data, Frequency Domain Inference, and Volatility Forecasting, Review of Economics and Statistics, 83(5), 596-602. [18] Christensen, Kim, and Podolskij, Mark, 2005, Asymptotic Theory for Range-Based Estimation of Integrated Variance of a Continuous Semi-Martingale, working paper, Aarhus School of Business. [19] Christodoulakis, George A., and Satchell, Stephen E., 2004, Forecast Evaluation in the Presence of Unobserved Volatility, Econometric Reviews, 23(3), 175-198. [20] Christoffersen, Peter and Diebold, Francis X., 1997, Optimal prediction under asymmetric loss, Econometric Theory, 13, 808-817. [21] Christoffersen, Peter, and Jacobs, Kris, 2003, The Importance of the Loss Function in Option Valuation, Journal of Financial Economics, forthcoming. [22] Clements, Michael P., 2005, Evaluating Econometric Forecasts of Economic and Financial Variables, Palgrave MacMillan, United Kingdom.
37
[23] Corradi, Valentina, and Swanson, Norman R., 2004, Predictive Density Evaluation, in the Handbook of Economic Forecasting, G. Elliott, C.W.J. Granger and A. Timmermann ed.s, North Holland Press, Amsterdam. [24] De Jong, Robert M., 1996, The Bierens Test Under Data Dependence, Journal of Econometrics, 72, 1-32. [25] Diebold, Francis X., and Mariano, Roberto S., 1995, Comparing Predictive Accuracy, Journal of Business and Economic Statistics, 13(3), 253-263. [26] Diebold, Francis X., and Lopez, Jose A., 1996, Forecast Evaluation and Combination, in G.S. Maddala and C.R. Rao (eds.), Handbook of Statistics, Amsterdam: North-Holland, 241-268. [27] Duffie, Darrell, and Pan, Jun, 1997, An Overview of Value at Risk, Journal of Derivatives, 4(3), 7-49. [28] Engle, Robert F., 1982, Autoregressive Conditional Heteroskedasticity With Estimates of the Variance of U.K. Inflation, Econometrica, 50, 987-1008. [29] Engle, Robert F., 1993, A Comment on Hendry and Clements on the Limitations of Comparing Mean Square Forecast Errors, Journal of Forecasting, 12, 642-644. [30] Engle, Robert F., and Patton, Andrew J., 2001, What Good is a Volatility Model?, Quantitative Finance, 1(2), 237-245. [31] Feller, W., 1951, The Asymptotic Distribution of the Range of Sums of Random Variables, Annals of Mathematical Statistics, 22, 427-432. [32] Garman, Mark B., and Klass, Michael J., 1980, On the Estimation of Security Price Volatilities from Historical Data, Journal of Business, 53(1), 67-78. [33] Gourieroux, C., Monfort, A., and Trognon, A., 1984, Pseudo Maximum Likelihood Methods: Theory, Econometrica, 52(3), 681-700. [34] Gourieroux, C., Monfort, A. and Renault, E., 1987, Consistent M-Estimators in a SemiParametric Model, CEPREMAP working paper 8720. [35] Gourieroux, C. and Monfort, A., 1996, Statistics and Econometric Models, Volume 1, translated from the French by Q. Vuong, Cambridge University Press, Great Britain. [36] Granger, C.W.J., 1969, Prediction with a generalized cost function, OR, 20, 199-207. [37] Hansen, Peter Reinhard, and Lunde, Asger, 2001, A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH(1,1)?, Journal of Applied Econometrics, forthcoming. [38] Hansen, Peter R., and Lunde, Asger, 2004, Consistent Ranking of Volatility Models, Journal of Econometrics, forthcoming. [39] Harvey, Andrew, Ruiz, Esther, and Shephard, Neil, 1994, Multivariate Stochastic Volatility Models, Review of Economic Studies, 61, 247-264.
38
[40] Jorion, Philippe, 1995, Predicting Volatility in the Foreign Exchange Market, Journal of Finance, 50(2), 507-528. [41] Komunjer, I., and Vuong, Q., 2004, Efficient Conditional Quantile Estimation, working paper. [42] Martens, Martin, and van Dijk, Dick, 2005, Measuring Volatility with the Realized Range, working paper, Econometric Institute, Erasmus University Rotterdam. [43] McNeil, Alexander J., and Frey, Rudiger, 2000, Estimation of Tail-Related Risk Measures for Heteroscedastic Financial Time Series: An Extreme Value Approach, Journal of Empirical Finance, 7, 271-300. [44] Mincer, Jacob, and Zarnowitz, Victor, 1969, The Evaluation of Economic Forecasts, in Zarnowitz, J. (ed.) Economic Forecasts and Expectations, National Bureau of Economic Research, New York. [45] Newey, Whitney K., and West, Kenneth D., 1987, A Simple, Positive Semidefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix, Econometrica, 55(3), 703708. [46] Pagan, Adrian R., and Schwert, G. William, 1990, Alternative Models for Conditional Volatility, Journal of Econometrics, 45, 267-290. [47] Parkinson, Michael, 1980, The Extreme Value Method for Estimating the Variance of the Rate of Return, Journal of Business, 53(1), 61-65. [48] Patton, Andrew J., and Timmermann, Allan, 2004, Properties of Optimal Forecasts under Asymmetric Loss and Nonlinearity, Centre for Economic Policy Research Discussion Paper 4037. [49] Poon, Ser-Huang and Granger, Clive W. J., 2003, Forecasting Volatility in Financial Markets, Journal of Economic Literature, 41, 478-539. [50] Shephard, Neil, 2005, Stochastic Volatility: Selected Readings, Oxford University Press, United Kingdom. [51] Theil, H., 1958, Economic Forecasts and Policy, North-Holland, Amsterdam. [52] West, Kenneth D., 2005, Forecast Evaluation, in the Handbook of Economic Forecasting, G. Elliott, C.W.J. Granger and A. Timmermann ed.s, North Holland Press, Amsterdam. [53] West, Kenneth D., Edison, Hali J., and Cho, Dongchul, 1993, A Utility-Based Comparison of Some Models of Exchange Rate Volatility, Journal of International Economics, 35, 23-45. [54] White, Halbert, 1994, Estimation, Inference and Specification Analysis, Econometric Society Monographs No. 22, Cambridge University Press, Cambridge, U.K. [55] White, Halbert, 1999, Asymptotic Theory for Econometricians, Revised Edition, Academic Press, San Diego.
39
Optimal forecasts under MAE loss 0.46 0.44 0.42
2
Median[e ]
0.4 0.38 0.36 0.34 0.32 0.3 0.28 0.26 3
6
9
12
15 18 Kurtosis
21
24
27
30
Figure 1: Optimal forecast under MAE loss when true variance is 1, for various levels of kurtosis, using the standardised Student’s t distribution. The dashed line represents the optimal forecast as ν → 4. Optimal forecasts under MSE-SD loss 0.64 0.62 0.6
E[|e|]
2
0.58 0.56 0.54 0.52 0.5 0.48
3
6
9
12
15 18 Kurtosis
21
24
27
30
Figure 2: Optimal forecast under MSE-SD loss when true variance is 1, for various levels of kurtosis, using the standardised Student’s t distribution. The dashed line represents the optimal forecast as ν → 4. 40
Various appropriate loss functions 2.5 b=1 b=0.5 b=0 (MSE) b=-0.5 b=-1 b=-2 (QLIKE) b=-5
2
loss
1.5
1
0.5
0
0
0.5
1
1.5
2 hhat (r2=2)
2.5
3
3.5
4
Figure 3: Loss functions for various choices of b. True σ ˆ 2 =2 in this example, with the volatility forecast ranging between 0 and 4. b=0 and b=-2 correspond to the MSE and QLIKE loss functions respectively. Ratio of loss from negative forecast errors to positive forecast errors 2.5 b=1 b=0.5 b=0 (MSE) b=-0.5 b=-1 b=-2 (QLIKE) b=-5
2
loss
1.5
1
0.5
0
0
0.5
1 forecast error (r2=2)
1.5
2
Figure 4: Ratio of losses from negative forecast errors to positive forecast errors, for various choices of b. True σ ˆ 2 =2 in this example, with the volatility forecast ranging between 0 and 4. b=0 and b=-2 correspond to the MSE and QLIKE loss functions respectively. 41
Conditional variance forecasts 18 16
60-day rolling window RiskMetrics-daily RiskMetrics-realised
Conditional variance
14 12 10 8 6 4 2 0 Jan94
Jan95
Jan96
Jan97
Jan98 May98
Figure 5: Conditional variance forecasts for IBM returns from three models, January 1994 to May 1998.
42
DM tests, using realised volatility sampled every 15 mins 5
DM test statistic
4 3 2 1
rw1 vs rm1
0
rw1 vs rm2
-1
rm1 vs rm2
-2 -5
-4
-3
-2
-1
0
1
0
1
0
1
DM tests, using realised volatility sampled every 65 mins 5
DM test statistic
4 3 2 1 0 -1 -2 -5
-4
-3
-2
-1
DM tests, using daily squared returns 5
DM test statistic
4 3 2 1 0 -1 -2 -5
-4
-3
-2 -1 loss function parameter
Figure 6: Test statistics from Diebold-Mariano tests of equal predictive accuracy, for various choices of loss function parameter. “rw1” stands for the rolling window volatilty forecast, “rm1” stands for the RiskMetrics forecast based on daily returns and “rm2” stands for the RiskMetrics forecast based on daily realised volatility computed using 15-minute returns. A positive value for the test statistic indicates that the first forecast has greater expected loss than the second forecast. The horizontal dashed lines are at ±1.96, the 95% point-wise critical values; the vertical dashed lines are at zero and minus two, corresponding to MSE and QLIKE loss.
43