Volatility Forecast Evaluation and Comparison Using Imperfect Volatility Proxies Andrew J. Patton∗ London School of Economics PRELIMINARY AND INCOMPLETE First version: March 2004. This version: 7 February, 2005.
Abstract
We show that the use of a conditionally unbiased, but imperfect, volatility proxy can lead to undesirable outcomes in some commonly used methods for evaluating and comparing conditional variance forecasts: the true conditional variance may be rejected as being sub-optimal, and an imperfect volatility forecast may be selected over the true conditional variance. We also consider the extent of the problem when more efficient volatility proxies, such as the intra-daily range or realised volatility, are used for forecast comparison. We derive necessary and sufficient conditions on the loss function for the ranking of competing volatility forecasts to be preserved when a volatility proxy is employed. Keywords: forecast evaluation, forecast comparison, loss functions, realised variance, range. J.E.L. Codes: C53, C52, C22. . ∗
The author would particularly like to thank Ivana Komunjer, Peter Hansen and Asger Lunde for helpful sug-
gestions and comments. Thanks also go to Tony Hall, Mike McCracken, Adrian Pagan and Kevin Sheppard. All remaining deÞciencies are the responsibility of the author. Some of the work on this paper was conducted while the author was a visiting scholar at the School of Finance and Economics, University of Technology, Sydney. Contact address: Financial Markets Group, London School of Economics, Houghton Street, London WC2A 2AE, United Kingdom. Email:
[email protected]. This paper is available from http://fmg.lse.ac.uk/∼patton/research.html.
1
1
Introduction
Given the central role that risk has in Þnancial decision making it is no surprise that much effort has been devoted to developing volatility models. The profusion of models that have been proposed since Engle’s (1982) seminal ARCH paper leads one to the problem of comparing the available volatility forecasting models. Evaluating and comparing economic forecasts is a well-studied problem, dating back at least to Theil (1958). However the evaluation and comparison of volatility forecasts, as opposed to other forecasts, is complicated by the fact that the variable of interest, the conditional variance, is not observable, even ex-post. This complication was resolved, at least partly, by recognising that the squared return on an asset at date t (assuming a zero mean return) is a conditionally unbiased estimator of the true unobserved conditional variance of the asset at date t. The high/low range and realised volatility, see Parkinson (1980) and Andersen, et al. (2003) for example, have also been used as volatility proxies. Many of the standard methods for forecast evaluation and comparison, such as the MincerZarnowitz (1969) regression and the Diebold-Mariano (1995) test, can be shown to be applicable when such a conditionally unbiased volatility proxy is used, see Andersen and Bollerslev (1998) for example. However, it is not true that using a conditionally unbiased volatility proxy will always lead to the same outcome as if the true conditional variance was used. In particular, some of the modiÞcations of standard methods employed by some authors lead to perverse outcomes. For example, in the volatility forecasting literature numerous authors have expressed concern that a few extreme observations may have an unduly large impact on the outcomes of forecast evaluation and comparison tests, see Bollerslev and Ghysels (1994), Andersen, et al. (1999) and Poon and Granger (2003) amongst others. One common response to this concern is to employ forecast loss functions that are “less sensitive” to large observations than the usual squared forecast error loss function, such as absolute error or proportional error loss functions. In this paper we show that such an approach can lead to incorrect inferences and the selection of inferior forecasts over better forecasts. Our research builds on recent work by Hansen and Lunde (2004), who were the Þrst to analyse the problems introduced by the presence of noise in the volatility proxy. These authors provide a sufficient condition on the loss function to ensure that the ranking of various forecasts is preserved when a noisy but conditionally unbiased proxy for the conditional variance is employed rather than the conditional variance itself. The current paper extends the work Hansen and Lunde (2004) in two important directions. Firstly, we derive explicitly the undesirable outcomes that may arise when
2
some common loss functions are employed, considering the three most commonly used volatility proxies: the daily squared return, the intra-daily range and a realised variance estimator. Secondly, we provide necessary and sufficient conditions on the loss function to ensure that the ranking of various forecasts is preserved when using a noisy volatility proxy. These conditions are related to those of Gourieroux, et al. (1984) for quasi-maximum likelihood estimation. In the presence of non-normally distributed returns the use of variance as a measure of risk has been called into question. Other risk measures, such as Value-at-Risk (VaR) and Expected Shortfoll (or “conditional VaR”), have been suggested in the literature, see Duffie and Pan (1997) or McNeil and Frey (2000) for example, as more relevant measures of risk. While we acknowledge the importance of the question of an appropriate risk measure, we take as given the fact that there is some interest in forecasts of conditional variance, and thus a derived demand in methods for evaluating and comparing forecasts of the conditional variance. The canonical problem in point forecasting is to Þnd the forecast that minimises the expected loss, conditional on time t information. That is, ∗ ≡ arg min E [L (Yt+h , y) |Ft ] Yˆt+h,t
(1)
y∈Y
where Yt+h is the variable of interest and Ft is the time t information set. Starting with the assumption that the forecast user is interested in the conditional variance, and that some noisy volatility proxy will be used in evaluation and comparison, we thus take the solution of the optimisation problem above (the conditional variance) as given, and consider the loss functions that would generate the desired solution. This approach is unusual in the economic forecasting literature: the more common approach is to take the forecast user’s loss function as given and derive the optimal forecast for that loss function, related papers here are Granger (1969), Engle (1993), Christoffersen and Diebold (1997), Christoffersen and Jacobs (2003) and Patton and Timmermann (2004a), amongst others. The fact that we know the forecast user desires a conditional variance forecast places limits on the class of loss functions that may be used for volatility comparison, ruling out some choices previously used in the literature. However the class of “appropriate” loss functions still admits a wide variety of loss functions and, without further information, conducting volatility forecast comparisons is not possible with existing tests. In separate work we propose a collection of tests that may be employed when only limited information is available about the forecast user’s loss function. In Sections 2 and 3 we derive expressions for the optimal forecast obtained by minimising the 3
expected loss for various loss functions using the daily squared return, range, or realised volatility as a proxy for the conditional variance. The optimal forecasts are functions of conditional moments or quantiles of the volatility proxy employed. We do not consider the problem of building econometric models, such as GARCH, stochastic volatility, CAViaR, etc, for these quantities. Of course, even a forecast that is equal to the appropriate conditional moment/quantile (in our case, the conditional variance) can perform poorly if the model for the conditional moment/quantile is mis-speciÞed, or if there is substantial estimation error. The remainder of this paper is as follows. In Section 2 we consider volatility forecast evaluation and comparison using the squared return as a volatility proxy, showing the problems that arise when using alternative Mincer-Zarnowitz regressions or Diebold-Mariano tests with alternative loss functions. In Section 3 we consider the same problem using the range and realised volatility as volatility proxies. In Section 4 we provide necessary and sufficient conditions that a loss function must satisfy so that perverse outcomes are avoided, and present a parametric class of loss functions appropriate for volatility forecast comparison. In Section 5 we illustrate the problem with three commonly-used volatility models, and in Section 6 we conclude and suggest extensions. All proofs and derivations are provided in appendices.
1.1
Notation
Let rt be the variable whose conditional variance is of interest, usually a daily or monthly asset return in the volatility forecasting literature. Let the information set available be denoted Ft−1 , and £ ¤ denote Vt−1 [rt ] ≡ σ 2t . We will assume throughout that Et−1 [rt ] = 0, and so σ 2t = Et−1 rt2 . We will ¡ ¢ ¡ ¢ cover the general case that rt |Ft−1 ∼ Ft 0, σ2t , where Ft 0, σ2t is some conditional distribution ¢ ¡ with mean zero and variance σ 2t , and then specialise to the case that rt |Ft−1 ∼ N 0, σ 2t to obtain speciÞc results in certain cases. Let εt ≡ rt /σ t denote the ‘standardised return’. Let a forecast of
the conditional variance of rt be denoted ht , or hi,t if there is more than one forecast under analysis. The forecast loss function be L : R+ × R++ → R+ , where the Þrst argument of L is σ 2t or some ˆ 2t , and the second is ht . R+ and R++ denote the non-negative and positive proxy for σ 2t , denoted σ
parts of the real line respectively. Commonly used volatility proxies are the squared return, rt2 , realised volatility, RVt , and the range, RGt . Optimal forecasts will be denoted h∗t .
4
2
Forecast evaluation and comparison using squared returns
In this section we will focus on the use of daily squared returns for volatility forecast evaluation. In Section 3 we will examine the use of “realised volatility” and the range.
2.1
Volatility forecast evaluation using a Mincer-Zarnowitz regression
A common method of evaluating a forecast is via a Mincer-Zarnowitz (1969), or MZ, regression1 , which involves regressing the realisation of the variable of interest on the forecast. As the conditional variance is never observed, the usual MZ regression is infeasible for volatility forecast evaluation. Fortunately, if we have a conditionally unbiased estimator of the conditional variance then the feasible MZ regression σ ˆ 2t = β 0 + β 1 ht + et yields unbiased estimates of β 0 and β 1 . The OLS parameter estimates will be less accurately esti¢ ¡ ˆ 2t , leading some authors, such as Andersen and Bollerslev mated the larger the variance of σ 2t − σ
(1998), to suggest the use of high frequency data to construct more accurate volatility proxies. The
less accurate estimates of β 0 and β 1 affect the power of the test to detect deviations from forecast optimality but do not affect the validity of the test. The use of squared returns in MZ regressions caused some researchers concern, as the estimation problem relies on fourth powers of the returns, and thus returns that are large in absolute value have a very large impact on the parameter estimates. Some authors, see Jorion (1995) and Bollerslev and Wright (2001) for example, have proposed taking some transformation of the volatility proxy to reduce the impact of large returns. Two such examples are: p |rt | = β 0 + β 1 ht + et , and ¡ ¢ log rt2 = β 0 + β 1 log (ht ) + et
(2) (3)
Take the regression in equation (2) as an example. Under the null that ht = σ 2t ∀ t the population 1
As Mincer and Zarnowitz (1969) note, this method for evaluating forecast accuracy dates back to Theil (1958),
however we will follow past work and call this a Mincer-Zarnowitz regression.
5
values of the OLS parameter estimates are easily shown to be: β 1 = Et−1 [|εt |] if Et−1 [|εt |] is constant r µ ¶ ³ ´ ¡ ¢ ν −1 ν −2 ν Γ /Γ , if rt |Ft−1 ∼ Student’s t 0, σ2t , ν , ν > 2 = π 2 2 p ¡ ¢ 2/π if rt |Ft−1 ∼ N 0, σ 2t =
β0 = 0
In the regression in equation (3) it is simple to show that the OLS parameter estimates are: β1 = 1 £ ¤ β 0 = Et−1 log ε2t 1 ≈ − (Kurtt−1 [rt ] − 1) 2 ¢ ¡ = −1 if rt |Ft−1 ∼ N 0, σ 2t
using a second-order Taylor series approximation2 for β 0 .
Thus while both of these approaches may initally appear reasonable, without some modiÞcation they lead to the undesirable outcome that the perfect volatility forecast, ht = σ 2t ∀ t, will be rejected with probability approaching one as the sample size increases3 . In both of the above
cases the perverse outcomes above are the result of the fact that unbiasedness is not invariant to nonlinear transformations. It can be shown the distortion increases as the conditional kurtosis of daily returns increases, see Figure 1 for example4 .
2.2
Volatility forecast comparison using a Diebold-Mariano test
The most widely used test for forecast comparison is the Diebold and Mariano (1995) test. If we ¢ ¡ deÞne ui,t ≡ L σ 2t , hi,t , where L is the user’s loss function, and let dt = u1,t − u2,t , then the DM test of equal predictive accuracy can be conducted as a simple t-test that H0 : E [dt ] = 0, vs.
(4)
Ha : E [dt ] 6= 0
2
¤ £ In the appendix we provide an analytical, though less easily interpreted, expression for E log ε2t under normality
and the Student’s t distribution. 3 Bollerslev and Wright (2001) and Andersen, et al. (2003) adjust the regression in such a way that this problem
does not arise. 4 In both panels we use the standardised Student’s t distribution to represent the excess kurtosis case. The Þrst Þgure is based on the expression in equation (16). The second Þgure is based on an expression provided in Appendix 1.
6
Like the MZ regression, the DM test may be used in conjunction with volatility proxy for certain loss functions. For other loss functions, using a volatility proxy in a DM test can lead to undesirable outcomes. To determine the applicability of a speciÞc loss function in a DM test for volatility forecast comparison consider the condition of Hansen and Lunde (2004): • The ranking of any two (possibly imperfect) volatility forecasts by expected loss using the chosen loss function is the same whether the ranking is done using the true conditional variance or some conditionally unbiased volatility proxy. Notice that this condition implies that the true conditional variance is the optimal forecast for the chosen loss function. Hansen and Lunde (2004) show that a sufficient condition for their ¡ 2 ¢ ¡ 2 ¢2 ˆ t , ht /∂ σ ˆ t does not depend on ht . condition to hold is that ∂ 2 L σ
Under squared-error loss one can easily show that the optimal forecast is the conditional vari£ ¤ ance: h∗t = Et−1 rt2 = σ 2t . Thus a Diebold-Mariano comparison of the true conditional variance
with any other volatility forecast, using rt2 as a volatility proxy and MSE as the loss function, will lead to the selection of the true conditional variance, subject to sampling variability. Further, it is clear that the MSE loss function also satisÞes the sufficient condition of Hansen and Lunde (2004). One common response to the concern that a few extreme observations drive the results of volatility forecast comparison studies is to employ alternative measures of forecast accuracy, see Pagan and Schwert (1990), Bollerslev and Ghysels (1994), Diebold and Lopez (1996) and Poon and Granger (2003), for example. The justiÞcation offered for such an approach is usually that the squared-error loss function over-emphasises the largest observations, placing excess weight on the largest observations. A collection of loss functions commonly employed in volatility forecast
7
evaluation and comparison is presented below. ¡ 2 ¢ ¡ 2 ¢2 M SE : L σ ˆ t , ht = σ ˆ t − ht ¡ 2 ¢ σ ˆ2 QLIKE : L σ ˆ t , ht = log ht + t ht ¢2 ¡ 2 ¢ ¡ 2 ˆ t − log ht M SE-LOG : L σ ˆ t , ht = log σ p ´2 ¡ 2 ¢ ³ M SE-SD : L σ ˆ t , ht = σ ˆ t − ht µ 2 ¶2 ¡ 2 ¢ σ ˆt M SE-prop : L σ ˆ t , ht = −1 ht ¯ ¡ 2 ¢ ¯ 2 ˆ t − ht ¯ M AE : L σ ˆ t , ht = ¯σ p ¯¯ ¡ 2 ¢ ¯¯ ˆ t − ht ¯ M AE-SD : L σ ˆ t , ht = ¯σ ¯ ¯ 2 ¯ ¡ 2 ¢ ¯σ ˆt ¯ M AE-prop : L σ ˆ t , ht = ¯ − 1¯¯ ht
(5) (6) (7) (8) (9) (10) (11) (12)
Consider the MAE loss function from above. As usual with an absolute-error loss function we obtain the median as the optimal forecast: £ ¤ h∗t = M ediant−1 rt2
£ ¤ = σ 2t · M ediant−1 ε2t ¡ ¢ ν −2 · M edian [F1,ν ] if rt |Ft−1 ∼ Student’s t 0, σ2t , ν = σ 2t · ν £ ¤ ¡ ¢ 2 = σ t · M edian χ21 if rt |Ft−1 ∼ N 0, σ 2t
(13) (14)
≈ 0.4549σ 2t
£ ¤ ¡ ¢ where M ediant−1 rt2 is the conditional median of rt2 given Ωt−1 , and Student’s t 0, σ2t , ν is a
Student’s t distribution with mean zero, variance σ 2t and ν degrees of freedom. Thus if we compare
a forecast which is exactly equal to σ 2t for all t to one that is equal to 0.4549σ 2t for all t, using the squared daily return as a proxy for the conditional variance, we will usually conclude that the perfect forecast is inferior to the one which is wrong by more than a factor of 2. Figure 2 shows that if returns have a Student’s t distribution then the degree of distortion is even larger. Another commonly used loss function is the MSE loss function on standard deviations rather than variances, from equation (8). The motivation for this loss function is that taking square root of the two arguments of the squared-error loss function shrinks the larger values towards zero, reducing the impact of the most extreme values of rt . However it also leads to an incorrect volatility
8
forecast being selected as optimal: ·³ ¸ √ ´2 ∗ ht ≡ arg min Et−1 |rt | − h h ·³ ¸¯ √ ´2 ¯ ∂ ¯ Et−1 |rt | − h FOC 0 = ¯ ∗ ∂h h=h t
so h∗t = (Et−1 [|rt |])2
(15)
= σ 2t (Et−1 [|εt |])2 µ µ ¶ ³ ´¶2 ¡ ¢ ν −1 ν ν −2 Γ /Γ = σ 2t , if rt |Ft−1 ∼ Student’s t 0, σ2t , ν , ν > 2 (16) π 2 2 ¢ ¡ 2 2 (17) σ t if rt |Ft−1 ∼ N 0, σ 2t = π ≈ 0.6366σ 2t For this loss function it is also true that excess kurtosis in asset returns exacerbates the distortion, which we can see in Figure 3 for returns that have the Student’s t distribution. In the appendix we provide the corresponding calculations for the remaining loss functions in the list above, and summarise the results in Table 1. Table 1 shows that the degree of distortion in the optimal forecast according to some of the loss functions used in the literature can be substantial. Under normality the optimal forecast under these loss functions ranges from about one-third of the true conditional variance to three times the true conditional variance. If returns exhibit conditional kurtosis then the range of optimal forecasts from these loss functions is even wider. Using these certain loss functions in Diebold-Mariano comparisons along with daily squared returns as a proxy for the true conditional volatility may lead to the perverse outcome that a competing variance forecast is selected rather than the true conditional variance. To illustrate and emphasize the importance of this point, consider the following example. ¡ ¢ Example 1: Assume that rt |Ft−1 ∼ N 0, σ2t , and that σ 2t follows a simple GARCH(1,1)
2 , subject to ω > 0 and 1 − β 2 − 2αβ − 3α2 > 0 (which is required process: σ 2t = ω + βσ 2t−1 + αrt−1 £ 4¤ ˆ 2t = rt2 , let L be the MSE-SD loss function, and let h1t = σ 2t and for E σ t to exist). Let σ
h2t = 2/πσ2t . Let n denote the number of observations available for conducting the test. Then the
Diebold-Mariano test statistic evaluated at population moments is: q ´2 √ ³ n 1 − π2 q ´4 q ´ ³ 1−(α+β)2 8 2 12 + π π − π2 1−(α+β)2 −2α2 − 1 − π2
r³ q 5 − 12 π2 + 12 π √ ≈ 0.1632 n, when β = 0.9 and α = 0.05
DM0 =
9
The proof is in Appendix 1. For the speciÞc case that [α, β] = [0.05, 0.9], which is reasonable for daily asset returns, the DM statistic is greater than 1.96 for sample sizes larger than 145. Thus with less than a year’s worth of daily data, we would expect to reject the true conditional variance in favour of a volatility forecast equal to around 0.64 times the true conditional variance. This example shows that choosing an inappropriate loss function for volatility forecast comparison can have important empirical implications in realistic situations. The sources of the mis-matches between the optimal forecast for a given loss function and the true conditional variance are easily identiÞed: the last three loss functions move from considering mean squared losses to considering mean absolute losses, which then change the solution of the optimisation problem from an expectation to a median. In the third and fourth cases the distortion follows again from the fact that the unbiasedness property is not invariant to nonlinear transformations. This is a relatively easy problem to remedy in practise; one needs to Þnd a conditionally unbiased estimator of the quantity of interest (σ t , log σ 2t , etc) either exactly or approximately, see Bollerslev and Wright (2001) and Andersen, et al. (2003) for example.
3
Using better volatility proxies
It has long been known that squared returns are a quite noisy proxy for the true conditional variance. One alternative volatility proxy that has gained much attention recently is “realized volatility”, used by Poterba and Sumers (1986) and French, et al. (1987) studied intensively recently by Andersen, et al. (1998, 2001, 2003), and Barndorf-Neilssen and Shephard (2002a,b). Another commonly used alternative to squared returns is the intra-daily range. It is well-known that if the log stock price follows a Brownian motion then both of these estimators are unbiased and more efficient than the squared return. In this section we obtain the rate at which the distortion in the ranking of alternative forecasts disappears when using realised volatility as the proxy, as the sampling frequency increases, for a simple DGP. These results can be viewed as complements to that of Hansen and Lunde (2004), who showed that under certain conditions the degree of distortion in ranking alternative forecasts is increasing in the variability of the proxy error. Assume that there are m equally-spaced observations per trade day, and let ri,m,t denote the ith intra-daily return on day t. In order to obtain analytical results for problems involving the range as a volatility proxy we consider only a simple DGP: zero mean return, no jumps, and constant conditional volatility within a trade day. Analytical results on the distribution of the range are
10
not currently available for more realistic DGPs, and obtaining such results would be an interesting extension but is left for future work. Of course we could obtain approximate results via simulation for more realistic DGPs but we do not attempt this here. Let rt = d ln Pt = σ t dWt στ
= σ t ∀τ ∈ (t − 1, t]
ri,m,t ≡
i/m Z
(19)
rτ dτ = σ t
(i−1)/m
so {ri,m,t }m i=1
(18)
¶ µ σ 2t ∼ iid N 0, m
i/m Z
dWτ
(20)
(i−1)/m
(21)
The “realised volatility” or “realised variance” is deÞned as: RVt ≡
m X
2 ri,m,t
i=1
Realised variance, like the daily squared return (which is obtained in the above framework by setting m = 1), is a conditionally unbiased estimator of the daily conditional variance. Its main advantage is that it is a more efficient estimator of the daily conditional variance than the daily squared £ ¤ return: for this DGP it can be shown that M SEt−1 rt2 = 2σ 4t while M SEt−1 [RVt ] = 2σ 4t /m. Intra-daily returns are known to exhibit time-varying volatility, serial correlation, diurnality and
non-normality, see Bai, et al. (2001) for example. The presence of these features mitigates some of the beneÞts of using high frequency data for volatility forecast evaluation, and so the improvements from using RVt presented below represent an upper bound on the actual improvements one may obtain when using high frequency data. A volatility proxy that pre-dates realised volatility by many years is the range, or the high/low, estimator, see Parkinson (1980), Garman and Klass (1980) and Ball and Torous (1984). Alizadeh, et al. (2002) use the fact that the range is widely available for long series and is more efficient than squared returns to improve the estimation of stochastic volatility models. The intra-daily log range is deÞned as: RGt ≡ max log Pτ − min log Pτ , t − 1 < τ ≤ t τ
τ
(22)
Under the dynamics in equation (18) Feller (1951) presented the density of RGt , and Parkinson (1980) presented a formula for obtaining moments of the range, which enable us to compute: £ ¤ Et−1 RG2t = log (16) · σ 2t ≈ 2.7726σ 2t 11
(23)
Details on the distributional properties of the range under this set-up are presented in Appendix 2. The above expression shows that squared range is not a conditionally unbiased estimator of σ 2t . Most authors, Parkinson (1980) and Alizadeh, et al. (2002) for example, who employ the range as a volatility proxy are aware of this and scale the range accordingly. We will thus focus below on the adjusted range: RGt ≈ 0.6006RGt RG∗t ≡ p log (16)
(24)
which, when squared, is an unbiased proxy for the conditional variance. It is simple to determine £ ¤ ≈ 0.4073σ 4t , which is approximately one-Þfth of the MSE of the daily squared that M SEt−1 RG∗2 t return, and so using the range yields an estimator as accurate as a realised volatility estimator constructed using 5 intra-daily observations. This roughly corresponds to the comment of Andersen and Bollerslev (1998, footnote 20) that the adjusted range yields an MSE comparable to the MSE of realised volatilities constructed using 2 to 3 hour returns. We now determine the optimal forecasts obtained using the various loss functions considered 2 ˆ 2t = RG∗2 above, when σ ˆ 2t = RVt or σ t is used as a proxy for the conditional variance rather than rt .
We initially leave m unspeciÞed for the realised volatility proxy, and then specialise to three cases: m = 1, 12 and 78, corresponding to the use of daily, half-hourly and 5-minute returns, on a stock listed on the NYSE. For MSE and QLIKE the optimal forecast is simply the conditional mean of σ ˆ 2t , which equals the conditional variance, as RVt and RG∗2 t are both conditionally unbiased. The MSE-SD loss σ t ])2 as an optimal forecast. Under the set-up introduced above, function yields (Et−1 [ˆ RVt ≡ mσ−2 t RVt
∼
m X
i=1 χ2m σ 2t ³
2 rt,i =
m σ 2t X 2 εt,i , so m i=1
hp i´2 E so h∗t = χ2m m hp i √ 1 E χ2m ≈ m − √ by a Taylor series approximation 4 m µ ¶ 1 1 ∗ 2 + so ht ≈ σ t 1 − 2m 16m2 2 0.5625 · σ t for m = 1 ≈ 0.9588 · σ 2t for m = 12 0.9936 · σ 2t for m = 78
The results for the MSE-SD loss function using realised volatility again show that reducing the variance of the volatility proxy improves the optimal forecast, and asymptotically the perfect forecast 12
is obtained5 . Using the range we Þnd that h∗t =
2 σ 2 ≈ 0.9184σ 2t π log 2 t
and so we Þnd that the distortion from using the range is approximately equal to that incurred when using a realised volatility constructed using 6 intra-daily observations. £ 2¤ ˆ t as the optimal forecast. Consider now the MAE loss function, which yields M ediant−1 σ
Thus for realised volatility we have
h∗t =
£ ¤ 1 M edian χ2m σ 2t m
£ ¤ For large m, M edian χ2m ≈ m−2/3, though most software packages have functions for the inverse £ ¤ cdf of a χ2m distribution. For small m the approximation M edian χ2m ≈ m − 2/3 + 1/ (9m) is more accurate. Thus µ ¶ 1 2 ∗ + 1− σ 2t ht ≈ 3m 9m2 2 0.4444 · σ t for m = 1 ≈ 0.9452 · σ 2t for m = 12 0.9915 · σ 2t for m = 78
£ ¤ using M edian χ2m ≈ m − 2/3 + 1/ (9m)
For the range we have
h∗t ≈
2.2938 2 σ = 0.8273σ 2t log 16 t
which is equivalent to using about 4 observations to construct the realised volatility proxy. Calculations for the remaining loss functions are collected in Appendix 2. The results are summarised in Table 2. These results conÞrm that as the proxy used to measure the true conditional variance gets more efficient the degree of distortion decreases for all loss functions. When using RVt as a volatility proxy we Þnd that
5
h∗t = σ 2t for MSE and QLIKE ¢ ¡ h∗t = σ 2t + O m−1 for MSE-prop, MAE and MAE-SD ¢ ¡ h∗t ≈ σ 2t + O m−1 for MSE-SD and MAE-prop ¢ ¡ log h∗t ≈ log σ 2t + O m−1 for MSE-LOG
Note that the result for m = 1 is different to that obtained in Section 2, which was h∗t =
2 2 σ π t
≈ 0.6366σ2t . This
is because for n = 1 we can obtain the expression exactly, using results for the normal distribution, whereas for arbitrary m we relied on a second-order Taylor series approximation.
13
Across loss functions we found that the range was generally approximately as good a volatility proxy as the realised volatility estimator constructed with between 4 and 6 intra-daily observations.
4
A class of appropriate loss functions
In the previous section we showed that amongst eight commonly used loss functions employed for comparing volatility forecast accuracy, only the MSE and the QLIKE loss functions lead to £ 2¤ ˆ t = σ 2t from the Þrst-order condition. This prompts the question of whether there h∗t = Et−1 σ
exist other loss functions that yield the conditional variance as the optimal forecast. The following proposition provides a necessary and sufficient family of such loss functions, which are closely related to the family of linear-exponential densities of Gourieroux, et al. (1984). We make the following assumptions: £ 2¤ ˆ t = σ 2t A1: Et−1 σ
A2: σ ˆ 2t |Ft−1 ∼ Ft ∈ F˜ , the set of all absolutely continuous distribution functions on R+ ≡
{x ∈ R : x ≥ 0} . A3: L is twice continuously differentiable with respect to h.
£ 2¤ A4: There exists some h∗t ∈ int (H) ⊂ R++ such that h∗t = Et−1 σ ˆt £ ¡ 2 ¢¤ ˆ t , h < ∞ for some h ∈ H ⊆ R++ ; (b) A5: L and¸ Ft are such that: ·(a) Et−1 L¸ σ · ∂L( σ ˆ 2t ,σ 2t ) ∂ 2 L( σ ˆ 2t ,σ 2t ) < ∞; and (c) Et−1 < ∞. Et−1 ∂h ∂h2 Proposition 1 Let assumptions A1 to A5 hold. Then the forecast that minimises expected loss is ˆ 2t is used rather than σ 2t , and the ranking of competing forecasts by expected loss is preserved when σ σ 2t if and only if the loss function L is of the form: ¡ 2 ¢ ¡ 2¢ ¢ ¡ 2 L σ ˆ , h = C˜ (h) + B σ ˆ + C (h) σ ˆ −h
(25)
where B and C are twice continuously differentiable, C is a strictly decreasing function on H, and C˜ is the anti-derivative of C. If we instead had a conditionally median-unbiased volatility proxy, then it is conjectured that the class of “appropriate” loss functions would change from that above, which resembles Gourieroux, et al. (1984) to the class of functions proposed by Komunjer and Vuong (2004) in the context of efficient conditional quantile estimation. Most work on volatility proxies has focussed on conditionally unbiased proxies, rather than median- (or quantile-) unbiased proxies, and so we focus solely on the use of conditionally unbiased proxies. 14
4.1
A parametric class of “appropriate” loss functions
Loss functions MSE and QLIKE are easily shown to be of the form given in Proposition 1, and thus are appropriate loss functions when using a conditionally unbiased volatility proxy. We now seek to Þnd a parametric class of loss functions, that is a member of the family proposed above, which includes MSE and QLIKE as special cases. We do this by noting that the Þrst-order conditions from MSE and QLIKE loss functions are both of the form: ¡ 2 ¢ ¢ ¡ 2 ∂L σ ˆ ,h = 0 = ahb σ ˆ − h , a 6= 0, b ∈ R ∂h
(26)
ie, A0 (h) = −ahb+1 and C 0 (h) = ahb , where a < 0. Integrating the above expression with respect to h yields: Z Z ¡ 2 ¢ 2 b σ h dh − a hb+1 dh L σ ˆ , h = aˆ ³ ´ 2 2 b+1 1 1 b+2 , a < 0, b ∈ a cˆ σ / {−1, −2} , (c, d) ∈ R2 − d + σ ˆ h − h b+1 b+2 ¡ 2 ¢ = a cˆ σ −d+σ ˆ 2 log h − h , a < 0, (c, d) ∈ R2 ³ ´ 2 a cˆ σ 2 − d − σˆh − log h , a < 0, (c, d) ∈ R2
Above is a general class of functions that yield the desired FOC. But a loss function is usually constrained to exhibit certain properties, such as having zero loss for a perfect forecast. We can use these properties to restrict the domains of the free parameters (b, c, d). We will impose three properties on the loss function, following the suggestions of Granger (1969, 1999) and Diebold ¡ 2 ¢ ¡ 2 2¢ ˆ = 0; (2) ∂L σ ˆ 2 ≤ (≥) h; (3) ˆ , h /∂h ≥ (≤) 0 for all σ (2001), amongst others. (1) L σ ˆ ,σ ¡ 2 ¢ σ 2 , h. The Þrst of these properties is simply a normalisation. The second L σ ˆ , h is real for ∀ˆ
property ensures that the function is weakly increasing as the forecast error moves away from zero.
It also implies that the minimum of the function occurs when the forecast is perfect (though it does not impose that the function has a unique minimum). The third property ensures that the loss function is economically interpretable. We impose these properties and collect the results in the proposition below.
15
Proposition 2 The following collection of functions µ ¶ ¡ 2 ¢ 1 1 2b(e)+4 2 b(e)+1 b(e)+2 L σ ˆ , h = a eˆ σ ˆ h h σ + − , “Class I” b (e) + 1 b (e) + 2 ¶ µ ¡ 2 ¢ σ ˆ2 2 2 , “Class II” ˆ −h−σ ˆ log L σ ˆ ,h = a σ h ¶ µ ¡ 2 ¢ σ ˆ2 σ ˆ2 + log , “Class III” L σ ˆ ,h = a 1 − h h r 3 e−4 where b (e) = − + , a = f (Zt−1 ) , e = g (Zt−1 ) 2 4e
(27) (28) (29)
where f and g are (possibly degenerate) functions of any Zt−1 ∈ Ft−1 , both with range R− ≡ {x ∈ R : x < 0} , satisfy the following conditions: ¡ 2 2¢ ˆ2 > 0 ˆ = 0 for all σ 1. L σ ˆ ,σ
¡ 2 ¢ ˆ 2 ≤ (≥) h 2. ∂L σ ˆ , h /∂h ≥ (≤) 0 when σ ¡ 2 ¢ ˆ2, h > 0 3. L σ ˆ , h is real for all σ
£ ¡ 2 ¢¤ £ 2¤ ˆ t , h /∂h = 0 for h = Et−1 σ ˆt 4. ∂Et−1 L σ
5. The ranking of competing forecasts by expected loss is preserved when σ ˆ 2t is used rather than £ 2¤ ˆ 2t such that Et−1 σ ˆ t = σ 2t . σ 2t , for any variable σ The proof is in Appendix 3. Note that in the proposition we allow the parameters a and e to be
functions of elements, Zt−1 , of the time t − 1 information set, Ft−1 . Doing so does not change the outcome of the Þrst-order condition, but it does change the loss from a forecast error. For example, the forecast user may be thought to experience a higher loss from forecast errors if the previous period’s forecast error, or loss value, was large. Alternatively the losses from forecast errors may be increasing/decreasing over time or with the business cycle. In Figures 4 and 5 we plot the above class of functions for various parameter values. The parameter a is just a scaling parameter in all cases and so we do not consider varying it. The set of Class I loss functions can generate a wide variety of shapes, ranging from symmetric (e = −0.5, corresponding to the MSE loss function) to asymmetric. There are, effectively, no free parameters in the Class II and III loss functions and so we just present the case that a = −1. While these two loss functions are quite similar they are not identical except in degenerate examples. £ ¡ 2 ¢¤ ˆ t , h and/or the It should be pointed out that for certain DGPs the expected loss, Et−1 L σ £ ¡ 2 ¢ ¤ ˆ t , h /∂h , may not exist. Before electing to use a particular expected Þrst derivative, Et−1 ∂L σ 16
loss function for volatility forecast evaluation the user should determine whether the expected loss exists, so that tests such as the Diebold-Mariano test are interpretable.
5
Empirical application to forecasting IBM return volatility
[ PRELIMINARY AND INCOMPLETE! ] In this section we consider the problem of forecasting the conditional variance of the daily return on IBM, using data over the period from January 1980 to December 2003, yielding 6058 observations in total. We consider three different volatility forecasts: those obtained from a 60-day rolling window variance, from the RiskMetrics volatility model, and from the EGARCH(1,1) model due to Nelson (1991). These models are as follows: 1 X60 2 r j=1 t−j 60 2 “RiskMetrics” : h2t = λh2t−1 + (1 − λ) rt−1 , λ = 0.94
Rolling window : h1t =
rt−1 ˆ t−1 log h3t−1 + α EGARCH(1,1) : log h3t = ω ˆ t−1 + β ˆ t−1 p h3t−1
(30) ¯ ¯ ¯ r ¯ ¯ t−1 ¯ + γˆ t−1 ¯ p ¯ ¯ h3t−1 ¯
(31) (32)
We use the Þrst 18 years of observations (approximately equal to three-quarters of our sample) for estimation, and the remaining 6 years of observations for forecast evaluation. The evaluation period thus runs from January 1998 to December 2003. We estimated the GARCH(1,1) parameters with an expanding window of data, obtaining volatility forecasts using parameters estimated only on data up until the day the forecast is obtained. A plot of the three volatility forecasts is provided in Figure 6. We employ two volatility proxies in the comparison of these forecasts: the daily squared return, and the adjusted intra-day range6 . The daily squared return is a conditionally unbiased proxy under the assumption of a zero conditional mean. We de-mean the returns prior to using them in the tests to follow. The adjusted squared range is a potentially less noisy volatility proxy than squared returns, however it relies on stronger assumptions about the DGP. The adjustment factor for the squared range, under the assumption that intra-daily prices follow a zero mean, constant volatility diffusion with no jumps is 1/ log (16) ≈ 0.36, as discussed in Section 3. The ratio of the sample average (de-meaned) squared return to the sample average squared range was 0.54, which 6
We are in the process of preparing a high frequency dataset for this stock so that we may also employ realised
volatility as a proxy.
17
is signiÞcantly different from 0.36 at the 0.05 level, indicating that the assumptions underlying the use of the range as a volatility proxy may be violated. We employ the adjusted squared range (adjusted by 0.54) as a volatility proxy for comparison nevertheless. In comparing these forecasts we present the results of individual DM tests under Þve loss functions; namely MSE loss, QLIKE loss, the “Class II” loss function from Section 4, the “Class I” loss function with parameter e = −0.25 and e = −5. In Table 3 we present tests comparing the RiskMetrics forecasts with the 60-day rolling window volatility forecasts. The DM tests indicate that the RiskMetrics forecasts lead to lower average loss than the rolling window forecasts for 3 out of 5 loss functions. When using the range as a volatility proxy one of these differences was signiÞcant at the 0.05 level. In Table 4 we present tests comparing the EGARCH(1,1) forecasts to the 60-day rolling window forecasts. This table shows that the EGARCH forecasts signiÞcantly out-perform the rolling window forecasts for all Þve individual loss functions considered in DM tests, using both squared returns and the adjusted range as a volatility proxy. Finally, in Table 5 we present tests comparing the EGARCH forecasts with the RiskMetrics forecasts. Here we again Þnd that the EGARCH volatility forecasts signiÞcantly out-perform the competing forecast; in this case the RiskMetrics forecast: for all Þve individual loss functions the DM test statistics are less than -1.96.
6
Conclusion
[ TO BE COMPLETED... ]
7
Appendix 1: Supporting calculations for Section 2
Wherever possible we derived solutions or approximate solutions analytically. This was not always possible and so in some cases we had to resort to simulations to obtain solutions. Optimal forecasts under alternative loss functions:
18
MSE-prop: h∗t
£ ¤ Et−1 rt4 £ ¤ = Kurtosist−1 [rt ] σ 2t = Et−1 rt2 ¶ µ ¡ ¢ ν−2 σ 2t if rt |Ft−1 ∼ Student’s t 0, σ2t , ν = 3 ν−4 ¡ ¢ 2 = 3σ t if rt |Ft−1 ∼ N 0, σ2t
MSE-log: © £ ¤ª h∗t = exp Et−1 log rt2 © £ ¤ª = σ 2t exp Et−1 log ε2t à ¡ ¢ ¶! µ ¶ µ 0 ν ¡ ¢ Γ 4 ν − 2 + ¡ ν2¢ + log = −σ 2t γ E + log if rt |Ft−1 ∼ Student’s t 0, σ2t , ν ν ν Γ 2 ¢ ¡ = −σ 2t (γ E + log 2) rt |Ft−1 ∼ N 0, σ 2t ≈ −1.2704σ 2t
where γ E ≈ 0.577216 is Euler’s constant.
£ ¤ QLIKE: h∗t = Et−1 rt2 = σ 2t MAE-SD:
£ ¤ h∗t = (M ediant−1 [|rt |])2 = σ 2t (M ediant−1 [|εt |])2 = σ 2t M ediant−1 ε2t £
2
¤
since M edian [X] = M edian X 2 for any non-negative random variable X . Thus the optimal forecast is identical to that under MAE loss, which is given in the body of the paper.
¡ ¢ MAE-prop: If rt2 |Ft−1 ∼ Ft σ 2t and ε2t ≡ rt2 /σ 2t |Ft−1 ∼ Gt (1) then Z h∗t 2 Z ∞ 2 rt ¡ 2 ¢ 2 rt ¡ 2 ¢ 2 F OC 0 = ft rt drt − ∗ ∗ ft rt drt ht h∗t ht 0 Z h∗t 2 Z ∞ 2 rt ¡ 2 ¢ 2 rt ¡ 2 ¢ 2 so f = r dr t t t ∗ ∗ ft rt drt ht h∗t ht 0 · 2¯ ¸ · 2¯ ¸ rt ¯¯ 2 rt ¯¯ 2 ∗ ∗ ∗ ∗ r r Ft (ht ) Et−1 ≤ h (h )) E > h = (1 − F t t−1 t t t h∗ ¯ t h∗ ¯ t t
t
without loss of generality let h∗t ≡ σ 2t γ ∗t , γ ∗t > 0, so
¯ ¸ · 2¯ ¸ ¡ 2 ∗ ¢¢ ¡ ε2t ¯¯ 2 εt ¯¯ 2 ∗ ∗ ε ≤ γt ε > γt Et−1 = 1 − Ft σ t γ t Et−1 γ ∗t ¯ t γ ∗t ¯ t ¯ ¯ £ ¤ £ ¤ Gt (γ ∗t ) Et−1 ε2t ¯ ε2t ≤ γ ∗t = (1 − Gt (γ ∗t )) Et−1 ε2t ¯ ε2t > γ ∗t
¡
Ft σ 2t γ ∗t
¢
·
19
If ε2t |Ft−1 ∼ G (1), then γ ∗t = γ ∗ ∀ t. Finding an explicit expression for h∗t is difficult, and so we used 100,000 simulated draws and found that
¡ ¢ h∗t ≈ 2.3624σ 2t if rt |Ft−1 ∼ N 0, σ 2t Diebold-Mariano test using MSE-SD loss: We have ´2 ³ p dt = (|rt | − σ t )2 − |rt | − 2/πσt
and we seek to Þnd an expression for7
DM0 =
= as a function of (ω, α, β, n) .
√ nE [d ] q £√ t ¤ V nd¯n √ nE [dt ] p if dt is serially uncorrelated V [dt ]
´2 ³ p dt = (|rt | − σ t )2 − |rt | − 2/πσt ! Ãr ¶ µ 2 2 2 σt + 2 − 1 |εt | σ 2t = 1− π π Ã r !2 £ ¤ 2 so E [dt ] = 1− E σ 2t π
and
à !!2 Ãr £ 2¤ 2 2 −1 E dt = E σ 4t Et−1 1 − + 2 |εt | π π ! à r r £ ¤ 2 12 8 2 12 = 5 − 12 + + − 2 E σ 4t π π π π π £ ¤ £ ¤ The quantities E σ 2t and E σ 4t depend on the DGP for the returns, and in this case they equal: £ ¤ E σ 2t =
£ ¤ E σ 4t = 7
ω , if α + β < 1 1−α−β ³ ´ ω 2 (1 + α + β) ³ ´ , if 1 − (α + β)2 − 2α2 > 0 1 − (α + β)2 − 2α2 (1 − α − β)
In the interests of parsimony we present results under the false assumption that dt is serially uncorrelated.
In unreported work we also derived the variance allowing for serial correlation in dt and found that accounting for the serial correlation does not change the conclusion signiÞcantly. The serial correlation turns out to be negative, and so the correct variance is slightly smaller than the naïve variance estimator used, which makes our point even stronger.
20
so
DM
=
=
r³ q 5 − 12 π2 +
12 π
r³ q 5 − 12 π2 +
12 π
q ´2 £ ¤ √ ³ n 1 − π2 E σ 2t q ´4 £ ¤ q ´ £ ¤ ³ 2 8 2 12 4 + π π − π2 E σ t − 1 − π2 E σ 2t q ´2 √ ³ n 1 − π2 q q ´4 ³ ´ 1−(α+β)2 8 2 12 + π π − π2 1−(α+β)2 −2α2 − 1 − π2
as stated in the text. Note that the parameter ω does not affect the statistic.
8
Appendix 2: Calculations supporting Section 3
Feller (1951) presents the density of range:
f (RGt ; σ t ) = 8
∞ X
k−1
(−1)
k=1
µ ¶ k · RGt k2 φ σt σt
where φ is the standard normal pdf . For practical purposes the sum in the above expression needs to be truncated at some Þnite value; we truncate at k = 1000. Parkinson (1980) presented the cdf of the range, and a formula for obtaining moments:
½ µ ¶ µ ¶ µ ¶¾ (k + 1) RGt k · RGt (k − 1) RGt √ √ √ F (RGt ; σ t ) = (−1) k erfc − 2erfc + erfc σ 2 σ 2 σ 2 k=1 µ ¶³ ´ p+1 4 2p/2 − 22−p/2 ζ (p − 1) σ pt , E [RGpt ] = √ Γ 2 π ∞ X
k−1
√ R∞
2
where erfc(x) ≡ 1− erf(x), erf(x) is the ‘error function’: erf (x) ≡ 2/ π 0 e−t dt, and ζ is the Riemann zeta function. From this expression we can obtain the necessary moments for computing optimal forecasts when the range is used as a volatility proxy. For the Þrst and second moments of RGt we can obtain simple −3 which is an irrational number, and thus only expressions, but the fourth moment involves ζ (3) = Σ∞ k=1 k
a numerical expression is available. In addition to the moments of RGt , we will need the mean of log RGt and the median of RGt . We used quadrature and OLS to obtain the expression8 :
Et−1 [log RGt ] = 0.4257 + log σ t 8
(33)
We used quadrature to estimate Et−1 [log RGt ] for σ t = 0.5, 1, 1.5, ..., 10. We then regressed these esti-
mates on a constant and log σ t to obtain the parameter estimates. The R2 from this regression was 1.0000.
21
which is consistent with the expression given in Alizadeh, et al. (2002). We numerically inverted the cdf of the range, given in Parkinson (1980), and used OLS to determine the following relation9 :
M ediant−1 [RGt ] = 1.5145σ t £ ¤ so M ediant−1 RG2t = 2.2938σ 2t , since RGt is weakly positive.
© £ ¤ª ˆ 2t . As was the case when using squared returns as a volatility MSE-LOG: h∗t = exp Et−1 log σ
proxy, Taylor series approximations did not provide a good Þt when considering realised variance as a proxy,
and so we resorted to simulations. We simulated 50,000 “days” worth of observations, where the number of observations per day considered was m = {1, 3, 5, 7, 10, 13, 20, 40, 60, 79, 100}. The following expression
yielded an R2 of 0.9959 : Et−1 [log RVt (m)] ≈ −1.2741/m, so the optimal forecast under our DGP assumption is h∗t ≈ σ 2t e−1.2741/m . For the range we Þnd that
£ ¤ Et−1 log RG∗2 = 2Et−1 [log RG∗t ] t
= −0.1684 + log σ 2t
so h∗t = e−0.1684 σ 2t ≈ 0.8450σ 2t £ 2¤ MAE-SD: The optimal forecast is h∗t = M ediant−1 σ ˆ 2t is weakly positive we know that ˆ t . Since σ £ 2¤ M ediant−1 σ σ t ])2 , and so the results for this loss function are identical to those for ˆ t = (M ediant−1 [ˆ the MAE loss function.
£ 4¤ £ 2¤ ˆ t /Et−1 σ ˆ t . When realised volatility is used as the proxy we Þnd: h∗t = MSE-prop: h∗t = Et−1 σ ´ ³ ¡1 ¢ ¢ ¡ m−1 2 = 1 + 2 σ 2 . For the range we Þnd that: h∗ = 10.8185/ (log 16)2 σ 2 ≈ σ Kurtosis [r ] + t−1 t,i t t t t m m m
1.4073σ 2t .
MAE-prop: For realised variance, like the daily squared return, obtaining an analytical, even approximate, solution to this problem is difficult and so we used simulations. In the set-up given in the text it is again possible to show that the optimal forecast is of the form h∗t = γ ∗ σ 2t . For realised volatilty we simulated 50,000 “days” worth of observations, where the number of observations per day considered was
m = {1, 3, 5, 7, 10, 13, 20, 40, 60, 79, 100}, and used numerical methods to locate the optimum forecast. ¡ ¢ 2 The following expression yielded an R2 of 0.9999 : h∗t ≈ 1 + 1.3624 σ t . For the range we again used a num merical minimisation algorithm, combined with quadrature to compute the expectation in the optimisation problem: h∗t ≈ 0.9941σ 2t . 9
The R2 from this relation for σ = 0.5, 1, 1.5, ..., 10 was 1.0000.
22
9
Appendix 3: Proofs of Propositions
Proof of Proposition 1: to be inserted. Proof of Proposition 2: In the following we drop time subscripts where they are not needed. Consider Case III Þrst:
¡ 2 ¢ ¡ 2 2¢ = a cˆ σ − d − 1 − log σ L σ ˆ ,σ ˆ ˆ 2 = 0, so d = cˆ σ 2 − 1 − log σ ˆ 2 , thus ¶ µ ¡ 2 ¢ σ ˆ2 σ ˆ2 L σ ˆ ,h = a 1 − + log h h
¡
¢
Thus both c and d drop out of the expression once we impose the normalisation that L σ ˆ2, σ ˆ 2 = 0. Now we use the second desired property:
¡ 2 ¢ L σ ˆ ,h ≥ 0 ¡ ¡ 2 ¢¢ ¡ ¢ lim sgn L σ ˆ ,h = lim sgn a + a log σ ˆ 2 − a log h
σ ˆ 2 →0+
σ ˆ 2 →0+
= sgn (−a) > 0 as a < 0.
So the Class III loss function becomes:
¶ µ ¡ 2 ¢ σ ˆ2 2 − log h , a < 0 L σ ˆ , h = a 1 + log σ ˆ − h
The QLIKE function is corresponds to a Class III loss function with a = −1, up to an additive constant
¡ ¢ −1 − log σ ˆ2 .
Now consider Class II:
¡ 2 2¢ ¡ 2 ¢ L σ ˆ ,σ ˆ ˆ 2 log σ ˆ2 − σ ˆ 2 = 0, so = a cˆ σ −d+σ ˆ 2 log σ ˆ2 − σ ˆ 2 , thus d = cˆ σ2 + σ ¶ µ ¡ 2 ¢ σ ˆ2 2 2 ˆ −h−σ ˆ log L σ ˆ ,h = a σ h
And so again the c and d terms drop out. Consider now
¶ µ σ ˆ2 2 2 lim a σ ˆ −h−σ ˆ log h σ ˆ 2 →0+ = −ah ≥ 0 as a < 0
¡ 2 ¢ lim L σ ˆ ,h =
σ ˆ 2 →0+
Thus the Class II loss functions are
¶ µ ¡ 2 ¢ σ ˆ2 ,a −2, as h > 0 and a < 0. The only remaining choice is between the plus and minus in
r
e−4 4e q subject to the constraint that b > −2. For all e < 0, − 32 − e−4 4e < −2, and so this choice is eliminated. 3 b=− ± 2
Thus the Class I loss functions are given by:
¶ µ ¡ 2 ¢ 1 1 b+2 2b+4 2 b+1 σ ˆ h h , a < 0, e < 0 L σ ˆ , h = a eˆ σ + − b+1 b+2 r e−4 3 where b = − + 2 4e 24
The MSE element of this class is where a = −2 and e = − 12 , so b = 0. All have been normalised to satisfy the Þrst condition. The Class II and III functions are real whenever
σ ˆ 2 and h are real, and the Class I function parameters were chosen to ensure the function value is also always real, thus the third condition is satisÞed. By construction all of the functions satisfy the fourth condition. To show the second condition it is sufficient to show that h = σ ˆ 2 is the global minimum of these functions. Class I:
¡ 2 ¢ ∂L σ ˆ ,h ¡∂h2 ¢ ∂L σ ˆ ,h ¡∂h2 ¢ 2 ˆ ,h ∂ L σ ∂h2 ¡ 2 ¢ ¯¯ 2 ∂ L σ ˆ ,h ¯ ¯ ¯ ∂h2 2 h=ˆ σ
= aˆ σ 2 hb − ahb+1 © 2 ª = 0⇒h= σ ˆ , 0 are the extrema of this function ¢ ¡ 2 = abhb−1 σ ˆ − h − ahb = −aˆ σ 2b > 0
Thus h = 0 is an inßection point and h = σ ˆ 2 is the unique minimum of the function. Class II:
¡ 2 ¢ ∂L σ ˆ ,h ¡∂h2 ¢ ∂L σ ˆ ,h ¡∂h2 ¢ 2 ∂ L σ ˆ ,h ∂h2
= −a +
aˆ σ2 h
= 0⇒h=σ ˆ 2 is the unique extremum = −
aˆ σ2 >0 h2
and so h = σ ˆ 2 is the unique minimum of the function. Class III:
¡ 2 ¢ ∂L σ ˆ ,h ¡∂h2 ¢ ∂L σ ˆ ,h ¡∂h2 ¢ 2 ˆ ,h ∂ L σ ∂h2 ¡ 2 ¢ ¯¯ 2 ∂ L σ ˆ ,h ¯ ¯ ¯ ∂h2 h=ˆ σ2 ¡ ¢ 2 2 ∂ L σ ˆ ,h lim h→∞ ∂h2
=
aˆ σ2 a − h2 h
© 2 ª = 0⇒h= σ ˆ , ∞ are the extrema 2aˆ σ2 a + 2 3 h h −a > 0 for σ ˆ2 > 0 σ ˆ4
= − =
= 0
Thus h = σ ˆ 2 is the unique minimum, whenever σ ˆ 2 > 0. Thus for all three functions h = σ ˆ 2 is the
¡
¢
¡
¢
unique minimum, and since all have been normalised so that L σ ˆ2, σ ˆ 2 = 0, we have L σ ˆ 2 , h ≥ 0 for all
σ ˆ 2 , h > 0.
25
¡ 2 ¢ It is easily veriÞed that ∂L σ ˆ , h /∂h ≥ (≤) 0 when σ ˆ 2 ≤ (≥) h for all three loss functions. It is also
easily seen that allowing a and e to be functions of the time t − 1 information set does not change the result of the Þrst-order condition, which involves a conditional expectation using Ft−1 . By verifying that the loss functions satisfy the representation conditions in Proposition 1 it is shown that this class also satisÞes the stronger condition that the ranking of any two (possibly imperfect) volatility forecasts using some conditionally unbiased volatility proxy is the same as that obtained using the true conditional variance. ¥
10
Tables and Figures Table 1: Optimal forecasts under various loss functions
MSE
σ 2t
Optimal forecast, h∗t ¡ ¡ ¢ ¢ ∼ Ft 0, σ2t rt |Ft−1 ∼ N 0, σ 2t
QLIKE
σ 2t
σ 2t
Loss function
MSE-LOG MSE-SD MSE-prop MAE MAE-SD MAE-prop
rt |Ft−1
σ 2t
© £ ¤ª exp Et−1 log rt2 σ 2t (Et−1 [|rt |])2
Kurtosist−1 [rt ] σ 2t £ ¤ M ediant−1 rt2 £ ¤ M ediant−1 rt2 n/a
≈ 0.2797σ 2t 2 2 π σt 3σ 2t
≈ 0.6366σ 2t
£ ¤ M edian χ21 · σ 2t ≈ 0.4549σ 2t £ ¤ M edian χ21 · σ 2t ≈ 0.4549σ 2t ≈ 2.3624σ 2t
Notes: This table presents the forecast which minimises the conditional expected loss when the £ ¡ ¢¤ squared return is used as a volatility proxy. That is, h∗t minimises Et−1 L rt2 , h , for various loss functions L. Approximate or numerical solutions are preÞxed by “≈”. The second column
presents the solutions when returns have an arbitrary conditional distribution Ft with mean zero and conditional variance σ 2t , the third column presents the solutions when returns are conditionally normally distributed.
26
≈ 0.9941σ 2t
≈ 0.8273σ 2t
≈ 0.8273σ 2t ≈ 2.3624σ 2t
0.4549σ 2t
0.4549σ 2t
3σ 2t
≈ 0.5625σ 2t
≈ 0.2794σ 2t
σ 2t
σ 2t
m=1
with 5-minute returns respectively.
to the use of daily squared returns, realised variance with 30-minute returns and realised variance
with constant volatility within each trade day and no jumps. The cases of m = 1, 12, 78 correspond
preÞxed by “≈”. In all cases returns are assumed to be generated as a zero mean Brownian motion
ˆ 2t = RVt , for various loss functions L. Approximate or numerical solutions are for σ ˆ 2t = RG∗2 t or σ
Notes: This table presents the forecast which minimises the conditional expected loss when the £ ¡ 2 ¢¤ ˆt , h , range or realised volatility is used as a volatility proxy. That is, h∗t minimises Et−1 L σ
MAE-prop
MAE-SD
MAE
≈ 1.4073σ 2t
MSE-prop
≈ e−1.2741/m σ 2t ³ hp i´2 ¡ ¢ 2 1 2 ≈ 1− 1 + 1 2 E χ σ σt 2 m t m 2m 16m ¡ ¢ 2 1+ m σ 2t ¢ 2 £ 2¤ 2 ¡ 2 1 1 m M edian χm σ t ≈ 1 − 3m + 9m2 σ t £ 2¤ 2 ¡ ¢ 2 2 1 1 m M edian χm σ t ≈ 1 − 3m + 9m2 σ t ¡ ¢ 2 ≈ 1 + 1.3624 σt m
≈ 0.8450σ 2t
MSE-LOG
≈ 0.9184σ 2t
σ 2t
σ 2t
QLIKE
2 2 π log 2 σ t
σ 2t
σ 2t
MSE
MSE-SD
Arbitrary m
Realised volatility
Range
Loss function
Volatility proxy
Table 2: Optimal forecasts under various loss functions, using realised volatility and range
≈ 1.1135σ 2t
0.9452σ 2t
0.9452σ 2t
1.16672t
≈ 0.95882t
≈ 0.89932t
σ 2t
σ 2t
m = 12
≈ 1.0175σ 2t
0.99152t
0.9915σ 2t
1.0256σ 2t
≈ 0.9936σ 2t
≈ 0.9838σ 2t
σ 2t
σ 2t
m = 78
σ 2t
σ 2t
σ 2t
σ 2t
σ 2t
σ 2t
σ 2t
σ 2t
m→∞
Table 3: Comparison of the RiskMetrics and rolling window forecasts
MSE QLIKE Class II Class I (e = −0.25) Class I (e = −5)
Volatility proxy Daily Squared squared return adjusted range 0.084 0.747 -0.894 -2.076∗ -0.772 -0.891 0.750 1.441 -0.674 -0.603
Notes: This table presents the t-statistics from Diebold-Mariano tests of equal predictive accuaracy for a 60-day rolling window forecast and RiskMetrics forecast over the period January 1998 to December 2003. A t-statistic greater than 1.96 in absolute value indicates a rejection of the null of equal predictive accuracy at the 0.05 level. These statistics are marked with an asterisk. The sign of the t-statistics indicates which forecast performed better for each loss function: a positive t-statistic indicates that the RiskMetrics forecast produced larger loss on average than the rolling window forecast, while a negative sign indicates the opposite.
Table 4: Comparison of the EGARCH(1,1) and rolling window forecasts
MSE QLIKE Class II Class I (e = −0.25) Class I (e = −5)
Volatility proxy Daily Squared squared return adjusted range -3.075∗ -3.210∗ ∗ -2.477 -4.160∗ -3.193∗ -4.414∗ ∗ -2.209 -2.148∗ -3.247∗ -4.291∗
Notes: Notes: This table presents the t-statistics from Diebold-Mariano tests of equal predictive accuaracy for a 60-day rolling window forecast and RiskMetrics forecast over the period January 1998 to December 2003. A t-statistic greater than 1.96 in absolute value indicates a rejection of the null of equal predictive accuracy at the 0.05 level. These statistics are marked with an asterisk. The sign of the t-statistics indicates which forecast performed better for each loss function: a positive t-statistic indicates that the EGARCH forecast produced larger loss on average than the rolling window forecast, while a negative sign indicates the opposite.
28
Table 5: Comparison of the EGARCH(1,1) and RiskMetrics forecasts Volatility proxy Daily Squared squared return adjusted range -3.001∗ -3.604∗ -2.010∗ -2.818∗ ∗ -2.917 -3.836∗ ∗ -2.788 -3.291∗ -2.994∗ -3.850∗
MSE QLIKE Class II Class I (e = −0.25) Class I (e = −5)
Notes: Notes: This table presents the t-statistics from Diebold-Mariano tests of equal predictive accuaracy for a 60-day rolling window forecast and RiskMetrics forecast over the period January 1998 to December 2003. A t-statistic greater than 1.96 in absolute value indicates a rejection of the null of equal predictive accuracy at the 0.05 level. These statistics are marked with an asterisk. The sign of the t-statistics indicates which forecast performed better for each loss function: a positive t-statistic indicates that the EGARCH forecast produced larger loss on average than the RiskMetrics forecast, while a negative sign indicates the opposite.
OLS slope parameter estimate (true=1)
OLS parameter estimate
OLS parameter estimate
0.8 0.78 0.76 0.74 0.72
3
6
6
3
4.5
6
7.5
9 10.5 kurtosis OLS intercept parameter estimate (true=0)
12
13.5
15
12
13.5
15
-1.3 -1.4 -1.5 -1.6 7.5
9 kurtosis
10.5
Figure 1: Impact of kurtosis on OLS parameter estimates in MZ regressions using transformations of rt2 . The top panel is for MZ regressions on standard deviations; the lower panel is for MZ regressions on log variances.
29
Optimal forecasts under MAE loss (true=1) 0.46 0.44 0.42
2
Median[e ]
0.4 0.38 0.36 0.34 0.32 0.3 0.28
3
6
7.5
9 kurtosis
10.5
12
13.5
15
Figure 2: Optimal forecast under MAE loss when true variance is 1, for various levels of kurtosis, using the standardised Student’s t distribution.
Optimal forecasts undere MSE-SD loss (true=1) 0.64
0.62
E[ |e| ]
2
0.6
0.58
0.56
0.54
0.52
4
6
8
10
12
14
kurtosis
Figure 3: Optimal forecast under MSE-SD loss when true variance is 1, for various levels of kurtosis, using the standardised Student’s t distribution.
30
Class I loss functions, a=-2, various e 6
5
loss
4
3
2
1
0
-0.025 -0.1 -0.25 -0.5 (MSE) -5 0
0.5
1
1.5
2 hhat (r2=2)
2.5
3
3.5
4
Figure 4: Class I loss functions for a=-2. True r2 =2 in this example, with the volatility forecast ranging between 0 and 4. When e=-0.5 the MSE loss function is obtained.
Class II and III loss functions, a=-1 4 Class II, a=-1 Class III, a=-1 (QLIKE) 3.5
3
loss
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5
2 hhat (r2=2)
2.5
3
3.5
4
Figure 5: Class II and III loss functions for a=-1. True r2 =2 in this example, with the volatility forecast ranging between 0 and 4. The Class III loss function corresponds to the QLIKE loss function.
31
Conditional variance forecasts 40 35
60-day rolling window RiskMetrics EGARCH(1,1)
Conditional variance
30 25 20 15 10 5 0 Jan98
Jan99
Jan00
Jan01
Jan02
Jan03
Dec03
Figure 6: Conditional variance forecasts from the three models, January 1998 to December 2003.
References [1] Alizadeh, Sassan, Brandt, Michael W., and Diebold, Francis X., 2002, Range-Based Estimation of Stochastic Volatility Models, Journal of Finance, 57(3), 1047-1091. [2] Andersen, Torben G., and Bollerslev, Tim, 1998, Answering the Skeptics: Yes, Standard Volatility Models Do Provide Accurate Forecasts, International Economic Review, 39, 885905. [3] Andersen, Torben G., Bollerslev, Tim, and Lange, Steve, 1999, Forecasting Financial Market Volatility: Sample Frequency Vis-à-vis Forecast Horizon, Journal of Empirical Finance, 6, 457-477. [4] Andersen, Torben G., Bollerslev, Tim, and Meddahi, Nour, 2002, Correcting the Errors: A Note on Volatility Forecast Evaluation based on High-Frequency Data and Realized Volatilities, Working Paper 322, Department of Finance, Kellogg School of Management, Northwestern University. [5] Andersen, Torben G., Bollerslev, Tim, and Diebold, Francis X., 2002, Parametric and Nonparametric Volatility Measurement, forthcoming in L.P. Hansen and Y. Ait-Sahalia (eds.), Handbook of Financial Econometrics, North-Holland, Amsterdam. [6] Andersen, Torben G., Bollerslev, Tim, Diebold, Francis X. and Labys, Paul, 2001, The Distribution of Realized Exchange Rate Volatility, Journal of the American Statistical Association, 96, 42-55.
32
[7] Andersen, Torben G., Bollerslev, Tim, Diebold, Francis X. and Labys, Paul, 2003, Modeling and Forecasting Realized Volatility, Econometrica, 71(2), 579-625. [8] Ball, Clifford A., and Torous, Walter N., 1984, The Maximum Likelihood Estimation of Security Price Volatility: Theory, Evidence, and Application to Option Pricing, Journal of Business, 57(1), 97-112. [9] Bai, X., Russell, Jeffrey R., and Tiao, George C., 2001, Beyond Merton’s Utopia I: Effects of Dependence and Non-Normality on Variance Estimates Using High-Frequency Data, working paper, Graduate School of Business, University of Chicago. [10] Barndorff-Nielsen, Ole E., and Shephard, Neil, 2002a, Econometric Analysis of Realised Volatility and Its Use in Estimating Stochastic Volatility Models, Journal of the Royal Statistical Society, Series B, 64, 253-280. [11] Barndorff-Nielsen, Ole E., and Shephard, Neil, 2002b, Estimating quadratic variation using realised variance, Journal of Applied Econometrics, 17, 457-477. [12] Bollerslev, Tim, and Ghysels, Eric, 1994, Periodic Autoregressive Conditional Heteroscedasticity, Journal of Business and Economic Statistics, 14(2), 139-151. [13] Bollerslev, Tim, and Wright, Jonathan H., 2001, High-Frequency Data, Frequency Domain Inference, and Volatility Forecasting, Review of Economics and Statistics, 83(5), 596-602. [14] Christoffersen, Peter and Diebold, Francis X., 1997, Optimal prediction under asymmetric loss, Econometric Theory, 13, 808-817. [15] Christoffersen, Peter, and Jacobs, Kris, 2002, The Importance of the Loss Function in Option Valuation, Journal of Financial Economics, forthcoming. [16] Diebold, Francis X., 2001, Elements of Forecasting (2nd edition). Southwestern. [17] Diebold, Francis X., and Mariano, Roberto S., 1995, Comparing Predictive Accuracy, Journal of Business and Economic Statistics, 13(3), 253-263. [18] Diebold, Francis X., and Lopez, Jose A., 1996, Forecast Evaluation and Combination, in G.S. Maddala and C.R. Rao (eds.), Handbook of Statistics, Amsterdam: North-Holland, 241-268. [19] Duffie, Darrell, and Pan, Jun, 1997, An Overview of Value at Risk, Journal of Derivatives, 4(3), 7-49. [20] Engle, Robert F., 1993, A Comment on Hendry and Clements on the Limitations of Comparing Mean Square Forecast Errors, Journal of Forecasting, 12, 642-644. [21] Feller, W., 1951, The Asymptotic Distribution of the Range of Sums of Random Variables, Annals of Mathematical Statistics, 22, 427-432. [22] Garman, Mark B., and Klass, Michael J., 1980, On the Estimation of Security Price Volatilities from Historical Data, Journal of Business, 53(1), 67-78. [23] Gourieroux, C., Monfort, A., and Trognon, A., 1984, Pseudo Maximum Likelihood Methods: Theory, Econometrica, 52(3), 681-700. 33
[24] Granger, C.W.J., 1969, Prediction with a generalized cost function, OR, 20, 199-207. [25] Granger, C.W.J., 1999, Outline of Forecast Theory Using Generalized Cost Functions. Spanish Economic Review 1, 161-173. [26] Hansen, Peter Reinhard, and Lunde, Asger, 2001, A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH(1,1)?, Working Paper 2001-04, Department of Economics, Brown University. [27] Hansen, Peter R., and Lunde, Asger, 2004, Consistent Ranking of Volatility Models, Journal of Econometrics, forthcoming. [28] Jorion, Philippe, 1995, Predicting Volatility in the Foreign Exchange Market, Journal of Finance, 50(2), 507-528. [29] Komunjer, I., and Vuong, Q., 2004, Efficient Conditional Quantile Estimation, working paper. [30] McNeil, Alexander J., and Frey, Rudiger, 2000, Estimation of Tail-Related Risk Measures for Heteroscedastic Financial Time Series: An Extreme Value Approach, Journal of Empirical Finance, 7, 271-300. [31] Mincer, Jacob, and Zarnowitz, Victor, 1969, The Evaluation of Economic Forecasts, in Zarnowitz, J. (ed.) Economic Forecasts and Expectations, National Bureau of Economic Research, New York. [32] Nelson, Daniel B., 1991, Conditional Heteroskedasticity in Asset Returns: A New Approach, Econometrica, 59(2), 347-370. [33] Parkinson, Michael, 1980, The Extreme Value Method for Estimating the Variance of the Rate of Return, Journal of Business, 53(1), 61-65. [34] Patton, Andrew J., and Timmermann, Allan, 2004a, Properties of Optimal Forecasts under Asymmetric Loss and Nonlinearity, Centre for Economic Policy Research Discussion Paper 4037. [35] Patton, Andrew J., and Timmermann, Allan, 2004b, Testable Implications of Forecast Optimality, working paper. [36] Poon, Ser-Huang and Granger, Clive W. J., 2003, Forecasting Volatility in Financial Markets, Journal of Economic Literature, 41, 478-539. [37] Theil, H., 1958, Economic Forecasts and Policy, North-Holland, Amsterdam. [38] West, Kenneth D., 1996, Asymptotic Inference about Predictive Ability, Econometrica, 64(5), 1067-1084. [39] Westfall, Peter H., and Young, S. Stanley, 1993, Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment, John Wiley & Sons, USA. [40] White, Halbert, 2000, A Reality Check for Data Snooping, Econometrica, 68, 1097-1126.
34