c 2010 Society for Industrial and Applied Mathematics
SIAM J. FINANCIAL MATH. Vol. 1, pp. 752–779
Optimal Allocation of a Futures Portfolio Utilizing Numerical Market Phase Detection∗ L. Putzig†, D. Becherer‡, and I. Horenko§ Abstract. This paper presents an application of the recently developed method for simultaneous dimension reduction and metastability analysis of high-dimensional time series in the context of computational finance. Further extensions are included to combine state-specific principal component analysis (PCA) and state-specific regressive trend models to handle the high-dimensional, nonstationary data. The identification of market phases allows one to control the involved phase-specific risk for futures portfolios. The numerical optimization strategy for futures portfolios based on Tikhonovtype regularization is presented. The application of proposed strategies to online detection of the market phases is exemplified first on the simulated data and then on historical futures prices for oil and wheat from 2005–2008. Numerical tests demonstrate the comparison of the presented methods with existing approaches. Key words. economic time series analysis, portfolio optimization, investment, regression, market phases, clustering AMS subject classifications. 62M10, 91B28, 91B84 DOI. 10.1137/090754029
1. Introduction. In today’s financial industry, computational finance is used for a wide field of applications. Throughout this paper, we will concentrate on the portfolio optimization problem, where, for return maximization and risk minimization problems, the evolution of security returns is estimated with a variety of different methods, in order to find an allocation that optimizes a suitable performance criterion, while limiting the risk [27, 31]. For practical reasons, computational methods have to address the following points: 1. Time series analysis. Market dynamics are most likely not reflected by a stationary process, but involve (hidden) market phases, such as economic cycles, harvest cycles (for grain, fruits, and other kinds of plants), and yearly holidays, and many more aspects influence the markets. This poses fundamental problems for estimation: To obtain sensible estimates for mean returns even in stationary linear models, hundreds of years of data would be needed. In shorter intervals, estimation risk becomes significant. ∗
Received by the editors June 15, 2009; accepted for publication (in revised form) July 3, 2010; published electronically October 21, 2010. This work was supported by the DFG reseach center Matheon “Mathematics for key technologies” in Berlin. http://www.siam.org/journals/sifin/1/75402.html † Corresponding author. Institute of Computational Science, University of Lugano, Via Giuseppe Buffi 13, CH6904 Lugano, Switzerland (
[email protected]). ‡ Institute of Mathematics, Humboldt-University Berlin, Rudower Chaussee 25, 12489 Berlin, Germany (becherer@ mathematik.hu-berlin.de). § Institute of Computational Science, University of Lugano, Via Giuseppe Buffi 13, CH-6904 Lugano, Switzerland (
[email protected]). 752
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
753
2. Sensitivity analysis. Approaches should be robust w.r.t. model parameter uncertainty. This is a challenge for identification when estimating time-varying parameters. 3. High dimensionality. Financial data are high dimensional, as thousands of securities are available from different asset classes (stocks, futures, etc.), and data are often available at high frequencies with nonequidistant time stepping (due to weekends, market closures, holidays, etc.); e.g., the Wilshire 5000 consists of 5000 stocks. Hence, computational ability to handle high numbers of assets (stocks, futures, etc.) is relevant. Aspects 2 and 3 cause a high uncertainty of estimates, whereas nonstationarity as it relates to aspect 1 (in observable dimensions, at least) means that estimates should involve timedependent functions, which could not be estimated in a nontrivial way, which is a well-known problem. In 1989 Hamilton [13] suggested a numerical method for identification of piecewise linear functions, called hidden market phases, assuming stationarity and Gaussianity of returns. While Gaussianity is still a commonly used assumption by practitioners, many modern models soften this by using, e.g., heavy tails [29]. Later this was extended to multidimensional data using linear vector autoregressive (VAR) models [23]. Still, standard phase-identification techniques based on the filtering approach (like wavelet-based spectral methods; see [1]) have in general infeasible numerical complexity in high dimensions; other methods (e.g., Kalman filter [21], VAR [23], (G)ARCH [6, 2]) require stronger assumptions about the underlying dynamics or the methods (as in [3]) are based on perfect observability of the Markov process, which cannot be taken for granted in reality. As demonstrated by the actual financial crises, nonrobust reliance on simplified assumptions and ignorance of model risk can (among other factors) lead to misperception of risk and possibly to a total loss of the investment. Therefore, there is a need for a wider class of methods that are computationally capable of dealing with realistic amounts of data, monitoring the evolution of risks, and optimizing the risk-return profile of investments. While recent publications concentrate on robustness by using worstcase scenarios (see, e.g., [10, 7, 11]) or penalty functions (see, e.g., [35]), or simply assuming vanishing mean returns, this paper aims at an estimation of nonstationary trends and volatilities directly from prices of futures contracts. Futures markets constitute broad and liquid markets for several asset classes [5]. Leverage can be quite large for futures if the amplitude of futures price variations is measured relative to the size of the margin requirement. But this also induces a high risk. The minimization of this risk is one of the central points of this paper. The recently developed framework for numerical treatment of aspect 1 of the above list of problems is presented and used for clustering and analysis of nonstationary financial data [16, 17, 14, 15]. An extension of the framework utilizing linear regression is introduced, allowing for simultaneous market phase identification, dimension reduction, and estimation of the phase-specific trends in the financial context. The ability of the extended framework to identify change points online is demonstrated, both on simulated and real market data. Finally, the numerical optimization strategy for phase-specific selection of futures portfolios is presented and exemplified on a real 8-dimensional data set of futures data for wheat and oil. This paper is organized as follows. In section 2, the FEM-based clustering framework is introduced and its extension towards the financial data context is presented. In section 3, we briefly consider the definition of returns and adapt this to futures markets, as the naive definition does not fit in this context. The numerical futures portfolio optimization approach is presented. Section 4 demonstrates the application of the proposed numerical framework
754
L. PUTZIG, D. BECHERER, AND I. HORENKO
to simulated and real data and offers a comparison to standard approaches (like the classical methods proposed by Markowitz [27], the flexible multivariate GARCH(1,1) model [24], and a simple n1 -portfolio). In the end, a practical side constraint is considered, the notion of neutrality. 2. Finding market phases. In the following, we will briefly describe the clustering framework first introduced in [16, 17, 14, 15]. Special emphasis will be placed on the numerical aspects and the interpretation of the framework in the context of high-dimensional financial applications. 2.1. The clustering problem. Let St : [0, T ] → [0, ∞)d be the observed d-dimensional series of asset prices. We look for K models characterized by K distinct sets of a priori unknown model parameters θ1 , . . . , θK ∈ Ω, where Ω is the parameter space. Let g(St , θi ) : Rd × Ω → [0, ∞) be some distance function of the observation St and the set of parameters θi . For the work done throughout this paper, the θi correspond to the changing market phases. Then the vector Γt = (γ1 (t), . . . , γK (t)), called the affiliation vector, gives the probability of being in the ith market phase at time t. For a given observation St we want to solve the clustering problem K
(2.1)
γi (t)g(St , θi ) → min Γt ,Θ
i=1
for Θ = (θ1 , . . . , θK ). Moreover, as the γi (t) shall be probabilities, they are subject to the constraint (2.2)
K
γi (t) = 1 ∀t ∈ [0, T ],
i=1
(2.3)
γi (t) ≥ 0 ∀t ∈ [0, T ], i = 1, . . . , K.
As we do not want to solve this problem for a specific time t only, but for the whole time interval [0, T ], we introduce the averaged clustering functional L: (2.4)
L(Θ, Γ) = 0
K T i=1
γi (t)g(St , θi ) → min, Γ,Θ
subject to the constraints (2.2)–(2.3).1 As was demonstrated in [16, 17, 14, 15] there are practical difficulties in solving problem (2.4) numerically. 1. The problem is infinite dimensional since γi (t) is an element of some (not yet specified) function space. 1
If the Θi were known or fixed to some value, the minimization problem (2.4) would be solved analytically w.r.t. Γ by 1, i = arg min g(St , Θi ), γi (t) = 0 else.
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
755
2. The problem is ill-posed in the definition of Hadamard [12], as neither the uniqueness of the solution nor its continuity w.r.t. the time series data is fulfilled in general. Because of the high number of parameters, additional regularity assumptions will be needed to identify the parameters, and to avoid problems like overfitting, underspecifying, and unstable estimates. 3. As long as g is nonconvex, only locally optimal solutions can be found. Problems 1 and 2 are addressed within sections 2.2 and 2.3; how to evade problem 3 is shown in sections 2.4 and 2.5. 2.2. Persistent market phases. As shown in [16, 17, 14, 15], one of the possibilities for overcoming the above-mentioned problem is to impose some additional assumptions on Γ. As we look for hidden market phases, we do assume the process switching between phases is slow: When the market switches to a new state, it will stay there for some time. This suggests a new constraint on Γ: T (∂t γi (t))2 dt ≤ γiε , i = 1, . . . , K, γiε ∈ (0, ∞). (2.5) 0
While (2.5) should reduce the transitions between different states, it is hard to handle this inside the optimization. Instead, we thus expand our problem to the regularized form as suggested by Tikhonov for linear least-squares problems [32]: (2.6)
Lε (Θ, Γ) = L(Θ, Γ) + ε2
K i=1
T 0
(∂t γi (t))2 dt.
As was shown in [16], the parameter ε2 can be used to control the persistence of the cluster states, by introducing a cost for short-term switching into another market phase. 2.3. Finite elements and discretization. Discrete time is, from a data point of view, not really an additional assumption, as market data, even if given intraday, is ticked data; thus it is discrete by nature. Time continuity of the data is a modeling idealization. This limits the function space from which Γ is chosen, addressing problems 1 and 2. Thus let T be a finite set of finite intervals on [0, T ]: T = {0 = t1 < t2 < · · · < tN = T }.
(2.7)
Then we can choose a finite element basis {v1 (t), . . . , vN (t)} with (2.8)
v1 (t) = 0 for t ∈ (t1 , t2 ),
v1 (t) = 0 for t ∈ / (t1 , t2 ),
(2.9)
vk (t) = 0 for t ∈ (tk−1 , tk+1 ),
vk (t) = 0 for t ∈ / (tk−1 , tk+1 ),
(2.10)
vN (t) = 0 for t ∈ (tN −1 , tN ),
vN (t) = 0 for t ∈ / (tN −1 , tN ).
This allows us to write γi as (2.11)
γi =
N k=1
γˆik vk + δiN ,
k = 2, . . . , N − 1,
756
L. PUTZIG, D. BECHERER, AND I. HORENKO
where δN = max{δiN , i = 1, . . . , K} is the discretization error. Now (2.6) could be written as N 2 T K T K N ε k 2 k (2.12) L (Θ, Γ) = γˆi vk (t)g(St , θi ) dt + ε γˆi ∂t vk (t) dt + O(δN ). 0
i=1 k=1
i=1
Using the local support of vk , we can define t2 v1 (t)g(St , θi ) dt, . . . , (2.13) α(θi ) = t1
and
(2.14)
⎛
t2
2 t1 (∂t v1 ) (t) dt
⎜ t2 H=⎜ ⎝ t1 ∂t v1 (t)∂t v2 (t) dt 0
t2
t1
k=1
tN tN−1
vN (t)g(St , θi ) dt
∂t v1 (t)∂t v2 (t) dt
t3
t1
0
(∂t v2 )2 (t) dt .. .
t3 t2
0
...
⎞
⎟ ∂t v2 (t)∂t v3 (t) dt . . . ⎟ ⎠ .. .. . .
to create the discrete clustering problem (see, for example, [16]) (2.15)
ε (Θ, Γ) = L
K α(θi )T γˆi + ε2 γˆiT H γˆi → min . Γ,Θ
i=1
Please note that up to now all assumptions made concern the smoothness of the clustering affiliation γi (t) (see condition (2.5)) only and no probabilistic or statistical assumptions on the data xt are made. This abdication of including a priori assumptions on probabilistic properties of the hidden and observable processes is a major feature of the presented method. 2.4. The distance function. When working with financial data, one might end up with thousands of assets; e.g., the Wilshire 5000 index consists of 5000 stocks. W.r.t. this high dimensionality, one has to think about methods able to handle these data. The popular idea is to use the principal component analysis (PCA; see [33, 20]), where only a linear lowdimensional manifold of the data space is used to represent the original. The drawback here is the linearity of the manifold, as a globally optimal linear subspace might not exist. Thus let Ti ∈ Rd×n for i = 1, . . . , K, let n < d with TiT Ti = Id be the projection matrices, and let xt ∈ Rd be some time series deduced from St . Then we can define the distance function by (2.16)
g(xt , θi ) = (xt − μi ) − Ti TiT (xt − μi ) 22 ,
θi = (μi , Ti ).
This distance function measures the reduction error resulting from the projection of the d-dimensional data on the n-dimensional linear manifold i. The xt could be the returns and Ti could be interpreted as the direction of maximal volatility of x, while μi describes the mean behavior [18]. We can now try to get better results by using the time-dependent function μi (t) instead of a constant one, e.g., approximating μi (t) with some regression model as in [17]. Therefore let ϕi (t) be some basis of the functional space (e.g., monomials); then the distance function changes to 2 ω ω j j T μi ϕj (t) −Ti Ti xt − μi ϕj (t) , θi = ((μ1i , . . . , μki ), Ti ), (2.17) g(xt , θi ) = xt − j=1
j=1
2
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
757
where ω is the order of the trend. In fact, we can see ωj=1 μji ϕj (t) as the smoothed trend, so this analysis can be interpreted as a detrending. Additionally we can estimate the covariance matrix of a clustered set (xt ) of the time series for some i by (2.18)
¯ ΣTi
=
1 t≤T¯
γi (t)
γi (t) xt −
t≤T¯
ω
μji ϕj (t)
xt −
j=1
ω
T μji ϕj (t)
.
j=1
From a practical viewpoint, one should include an additional weighting further connecting the assets to their liquidity. However, the influence of liquidity on risk is not part of our analysis. Please note that the combination of PCA, trend estimation, and clustering is not just a direct chaining of both algorithms. The estimation of trends and covariance matrices is embedded in the minimization problem (2.15). 2.5. Calculating the parameters. Given the analyzed time series xt and a cluster affiliation Γ, we can now compute the cluster parameters μji and Ti . Therefore we have to solve for each i = 1, . . . , K the problem α(θi )T γˆi + ε2 γˆiT H γˆi → min
(2.19)
θi
⇔
(2.20)
T
α(θi ) γˆi
→ min . θi
Using the definition of α and g we get 2 tk+1 N ω ω j j k T γi vk (t) xt − μi ϕj (t) − Ti Ti xt − μi ϕj (t) dt → min, (2.21) μi ,Ti tk−1 j=1
k=1
j=1
2
with t0 = t1 and tN +1 = tN . First-order conditions of optimality imply that partial derivatives w.r.t. μji and Ti vanish; hence for μ we obtain that 0=
N
γik
k=1
(2.22)
−
ω j=1
tk+1 tk−1
μji
N k=1
vk (t)(Ti TiT ϕl (t) − ϕl (t))T (xt − Ti TiT xt ) dt γik
tk+1 tk−1
vk (t)(Ti TiT ϕl (t) − ϕl (t))T (ϕj (t) − Ti TiT ϕj (t)) dt
holds, and an analogous equation is obtained for Ti . The latter yields that Ti consists of the eigenvectors corresponding to the n largest eigenvalues of the empirical covariance matrix of the values associated with state i. Now the problem (2.15) could be solved by subspace iteration; thus, starting with some arbitrary Γ0 , we solve iteratively (2.23)
ε (Θ, Γk ) Θk = arg min L Θ
and (2.24)
ε (Θk , Γ) Γk+1 = arg min L Γ
758
L. PUTZIG, D. BECHERER, AND I. HORENKO
for k = 0, 1, . . . . Although this algorithm is not guaranteed to converge to the globally optimum, at least a local minimum will be found. By performing this algorithm several times with different starting values Γ0 , we can approximate the global optimal parameters. This stepwise optimization allows us to exploit the structure of the problem. Separating these two steps allows us to split the overall optimization into two numerically conceivable steps performed subsequently until the overall optimization threshold is reached. These two steps are (i) a problem that is (in the PCA case) analytically solvable (2.23) and results in the efficient numerical solution of the eigenvalue problem; and (ii) a convex and sparse quadratic programming problem ((2.24) with (2.2)–(2.3)) [16]. Because of that, the proposed numerical scheme is numerically cheaper and more robust for high-dimensional applications than the standard “out of the box” optimization methods applied to the same problem. 3. Adapting portfolio optimization theory to futures. The usual definition of returns S −St cannot be naively applied to futures data, as the price for a contract is not St . Rt = t+Δt St Instead, no price is charged for entering a futures contract, but an initial amount is paid into the margin account that will hold the accrued gains and losses. So, given an initial margin of I ∈ Rd and a portfolio π in proportion of wealth, we define returns for futures over the period t to t + Δt as Rt = sgn(πt )
(3.1)
St+Δt − St I
(componentwise).
In contrast to portfolio optimization approaches for the stock market, it is an important feature of futures that long and short positions can be held equally well. But since the initial margin is dependent only on the absolute contract size, our portfolio constraints will change, and the absolute weights of the components should sum up to one: d
(3.2)
|πi (t)| = 1 ∀t ∈ [0, T ].
i=1
Given suitable estimates Σ, μ for covariances and means of returns, the portfolio problem is posed as π T Σπ → min,
(3.3)
π
with π 1 = 1, π T μ ≥ C,
for a chosen target mean return C. Since this is not yet (as in classical mean variance optimization) a quadratic minimization problem with linear (or quadratic) constraints, we use a splitting into long positions and short positions, similar to the idea explained in [9] in the context of image processing, to get an extended parameter set Σ −Σ long short T = ) , Σ , μ ˆ = (μ, −μ)T , (3.4) π ˆ = (π , π −Σ Σ and we apply a penalty term. We define the extended problem by (3.5)
π ˆ
T
+λ Σ
0 Id Id 0
π ˆ → min , π ˆ ∈R2d
with π ˆ μ ˆ ≥ C, T
2d i=1
π ˆi = 1, π ˆi ≥ 0 ∀i
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
759
with the regularization parameter λ > 0. Please note that for sufficiently large λ the part 0 Id T (3.6) λˆ π π ˆ = 2λ(π long )T π short Id 0 and the condition π ˆi ≥ 0 ∀i guarantee that for each asset the position is either long or short, but not both at the same time. Simultaneously, the remainder of the extended problem is just (3.7)
π = (π long − π short )T Σπ long − (π long − π short )T Σπ short = πΣπ π ˆ T Σˆ
and (3.8)
ˆ = μT π long − μT π short = μT π. μ ˆT π
4. Numerical examples. Throughout the first part of this section, we will make use of a simulated time series. These time series have a length of 1000 with a change point at time tC = 500. All of the data points are distributed independently according to N (0, Σ), where ⎧ I2 , t ≤ 500, ⎨ 0.85 0.4 α(t), α(t) = (4.1) Σ(t) = αT (t) cos(ρ) − sin(ρ) 0.4 0.2 ⎩ , t > 500. sin(ρ) cos(ρ) As one can see, after the change point at tC = 500, the covariance matrix is just rotated by some rotation angle ρ. 4.1. Covariance for small samples. from the literature, for small sample sizes, kAs known T is not robust enough; see, e.g., [30, 4]. This x x the maximum likelihood estimator k1 tt=t t t 1 will lead to numerical biassing when it comes to portfolio optimization [28, 25, 26]. We will use the Tyler M-estimator (see, e.g., [34, 8]) instead, which is defined by (4.2)
tk d xt xTt k ˆ . Σ = ˆ k )−1 xt k t=t xTt (Σ 1
ˆ k to (4.2) is unique only up to a multiplicative constant, this is not Although the solution Σ a problem for the clustering method (2.15), as neither the result of the minimization nor the eigenvectors (for PCA) is dependent on this constant. For Figure 1, the first 100 points of 100 time series were used (using the distribution stated above) and for every time step t ∈ {2, . . . , 100} the maximum likelihood estimator and the Tyler M-estimator were calculated. ˆ − Σ) was used as a distance measure, and Then the largest singular value of the difference (Σ the average and standard deviation were calculated and plotted. As one can see from this example, the standard deviation of the Tyler M-estimator is much smaller than the one for the maximum likelihood estimator (although both converge to the same limit). 4.2. Number of clusters. In realistic applications, the optimal number of metastable clusters is usually a priori unknown. Nevertheless, one can find out the number of statistically distinguishable clusters (see [16]). Therefore we start the clustering process with a rather high number of clusters, estimate the cluster parameters, and get the error bounds for these
L. PUTZIG, D. BECHERER, AND I. HORENKO
relative error (Max. Likelihood)
760
2 1.5 1 0.5 0 0
20
40 60 Length of the time series
80
100
20
40 60 Length of the time series
80
100
relative error (Tyler−M)
1 0.5 0 −0.5 0
Figure 1. Robustness of the maximum likelihood estimator and the Tyler M-estimator for short normally distributed time series. Parameters as in (4.1).
parameters by bootstrapping the elements of the cluster. If the parameter confidence sets of different clusters overlap, the statistical difference between those clusters is not high enough; thus the number of clusters should be reduced and the process should be started over again [22]. Using this technique for different ε leads to a picture like Figure 2. The decrease in the number of clusters is due to the fact that for the increasing ε the algorithm favors metastable clusters and merges the rapidly mixing ones together. Thus, while increasing ε, similar clusters will get merged into one. The plateau results from the high difference between the last two remaining clusters. For a real data set we will observe similar behavior when we look for the number of clusters. 4.3. Detecting a change point. Another important problem is to find the actual position of the change points. The number of detected change points depends on ε as one can see in Figure 3. However, a high number of change points usually indicates some rapidly mixing clusters that could be united to a metastable cluster (e.g., using half the data in Figure 3). Decreasing the number of clusters and increasing ε might help to solve this problem. On the other hand, a high value for ε will lead to a longer delay between occurrence and online detection of the change point; this can be seen in Figure 4. The same problem appears when the rotation angle becomes too small; see Figure 5.
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
761
2
Number of clusters depending on ε
Statistically different clusters
3
2
1
−20
10
−10
10
0
ε2
10
10
10
Figure 2. Number of distinguishable clusters depending on ε2 . Parameters as in (4.1); ρ = 90◦ .
4.4. Looking at real data. For the calculations in this paper, we used the futures prices from 2005–2008 for wheat and oil futures (four times to maturity each, quarterly for wheat and monthly for oil, daily closings) from http://www.kcbt.com/historical data.asp (Kansas City Board of Trade) and http://tonto.eia.doe.gov/dnav/pet/pet pri fut s1 d.htm (U.S. Department of Energy). For both commodities the four futures with the lowest time to maturity were taken into account. Applying the introduced clustering algorithm to these data gives the result shown in Figure 6. Implied by this picture, an optimal number of metastable clusters to use should be two. Then the cluster allocation for the whole time series (see Figure 7) produces a kind of seasonal cycle that could be due to a harvest or heating period. The phases differ in the following points: • The phase shown in Figure 7, and thus phase 1, is characterized by lesser overall trend in the price than phase 2. • The general direction of the maximum volatility (described by the projection operator Ti from (2.15)) in both phases is similar; the first eigenvectors of the covariance matrices are nearly equal, and the second eigenvectors span an angle of approximately 5 degrees.
762
L. PUTZIG, D. BECHERER, AND I. HORENKO Number of change points depending on ε2 150 First 500 datapoints 100 50 0 −20 10
−15
10
−10
−5
10
10 2 ε
0
10
5
10
10
10
Complete dataset 1
0 −20 10
−15
10
−10
−5
10
10
0
10
5
10
10
10
Figure 3. Number of detected change points depending on ε2 for half of the (artificial) data and the complete data set. Parameters as in (4.1); K = 2, ρ = 90◦ .
The effective investment for each commodity (as the sum of the portfolio weights over the different maturities) for the phase-dependent portfolio can be seen in Figure 8. The variance of the positions can be considered as a result of the parameter estimation error. Additionally, Figure 9 shows box-whisker plots of the relative notionals, defined by (4.3)
πi (t) , i∈Iu |πi (t)|
where Iu is the set of indices belonging to one commodity. The online version of the cluster affiliations is presented; thus the affiliations at time t are calculated “on the fly” using only the information ∀s, s < t during the portfolio optimization. The high variance in the resulting portfolio weights (resp., the relative notionals) is a direct result of nonrobust parameter estimates. For practical purposes, further steps in creating robust estimates or optimizing robustly w.r.t. uncertain estimates should be considered in further research. 4.5. Comparing portfolios. When looking at the interest rate for U.S. treasury bonds, an investment in January 2005 would have given a yearly interest rate of approximately 3.4%. Thus after four years we have a gain of 14.31% (that is a daily rate of 1.415 bps). We use this rate as a target rate C (as in (3.5)). As stated earlier, we compare our algorithm with the Markowitz approach, the FlexM algorithm from [24], and a simple n1 -portfolio, where half the assets are bought and the other half sold. For the cluster-based approach, the basis ϕ with polynomials up to order two was taken. The clustering was done w.r.t. μ and T . The FlexM algorithm provided by the authors of [24] was used utilizing the parameter demean; thus the estimation and subtraction of the mean from the data prior to the portfolio optimization is done by the FlexM function. The n1 -portfolio was built by buying for 18 of the wealth the two
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
763
Relative error in detection of change point
Detection time of change point depending on ε2 0.5
0.4
0.3
0.2
0.1
0 −20
10
−10
0
10
ε2
10
10
10
Figure 4. Relative error in detection time of the single change point depending on ε2 . Parameters as in (4.1); K = 2, ρ = 90◦ .
long maturity futures of each commodity and selling the remaining two for each commodity equally. Additionally, the Markowitz approach was used by estimating mean and covariance from the whole time series up to the actual point. All algorithms were supplied only with information available at time t, so no future knowledge is included. In each case, the portfolio was recalculated and/or rebalanced every 10 business days starting with day 10 of the time series. As one can see in Figure 10, all algorithms fall short of the target 14.31% mean return. While the n1 -portfolio is doing better (in terms of return rate) than the other algorithms, this comes with a higher risk, as one can see in Figure 11. We define the “intrinsic risk estimate” for portfolio returns from t to t + Δt by (4.4)
Rt =
πtT Cov(Rt )πt ,
where the empirical returns Rt are given by (3.1), and the Cov(Rt ) is an estimate for the covariance matrix of the data (based upon data only up to t), which is given by • a linear combination of the empirical cluster covariance matrices given by (2.18) for the cluster-based algorithm, and thus
764
L. PUTZIG, D. BECHERER, AND I. HORENKO
Relative error in detection of change point
Detection time of change point depending on ρ 0.5
0.4
0.3
0.2
0.1
0 0
20
40 60 ρ in Degree
80
100
Figure 5. Relative error in detection time of the single change point depending on ρ. Parameters as in (4.1); K = 2, ε2 = 1.
(4.5)
Cov(Rt ) =
K
γi (t)Σti ;
i=1
• the GARCH(1,1) estimate for the FlexM algorithm (see [24]); • the standard sample covariance matrix for the Markowitz approach and the n1 -portfolio. Although those intrinsic risks shown in Figure 11 are estimates of the instantaneous variance of the portfolio returns, they are not directly comparable, as the measures differ from each other and the underlying assumptions are different, so in Figure 12 risk is meant in the sense of the standard deviation of the daily returns of the portfolios from the empirical mean; the cluster-based approach still produces the strategy with the lowest risk. And even if we use the deviation from the target mean return C k−1 1 t (π T Rt − C)2 (4.6) k − 1 t=t t 1
as a risk measure (see the results in Figure 13), the cluster-based approach reaches the best result. The similarity of this risk measure and the standard deviation is easy to see: If the vari-
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
765
2
Number of clusters depending on ε
Number of clusters
5
4
3
2
1 −20
10
−10
10
0
ε2
10
10
10
Figure 6. Number of distinguishable clusters depending on ε2 . Data set: Prices of oil and wheat futures from 2005–2008. T μ ance minimal portfolio does not satisfy the condition π ˆmin ˆ ≥ C, i.e., the target mean return ˆ = C. is not reached, due to the convexity of the problem the solution of (3.5) will satisfy π ˆT μ This guarantees that for sufficiently large C the expected mean is C, and thus (4.6) would give the standard deviation. The advantage of (4.6) over the empirical standard deviation is the fact that the deviation from the target mean return does not depend on the empirical expectation of the portfolio returns. Indeed, as stated in Table 1, the empirical standard deviation of the portfolio returns and the empirical deviation from the target mean return are quite close to each other. The cluster-based approach reaches 69.85% of the performance of the n1 -portfolio, while it takes only 40.33% of the risk (in the sense of the empirical standard deviation). Here, rate of return denotes the total return minus one and average intrinsic is defined by
(4.7)
tN−1 1 Rt . N − 1 t=t 1
For different target returns the results are similar, although the cluster-based portfolio is less sensible to the target return rate (see Figures 14 and 15).
766
L. PUTZIG, D. BECHERER, AND I. HORENKO
Cluster allocation, Prices of oil and wheat futures 2005−2008 1 0.9 0.8 0.7
γ1(t)
0.6 0.5 0.4 0.3 0.2 0.1 0 2006
2007 Year
2008
2009
Figure 7. Cluster allocation for time series of futures; K = 2, ε2 = 1. Data set: Prices of oil and wheat futures from 2005–2008.
4.6. Adaption for notional neutral investment. A common relevant strategy of portfolio managers is so-called market neutral investment. The aim is to bet only from views on relative changes between asset prices but remain neutral w.r.t. overall market moves. To avoid assumptions about an appropriated market indicator or a market portfolio that characterizes the whole market, a more pragmatic approach is chosen: We define notional neutral investments as investments where long and short positions are balanced such that equal notionals are invested long and short for each security class. This is done by an additional constraint to ensure that the accrued notional of contracts within each class u (here wheat and oil) is equal to zero:
(4.8)
πi = 0
∀u,
i∈Iu
where Iu is an index set combining all assets for each selected class of futures within which accrued notionals are requested to be zero. For the extended problem, this transforms to (4.9)
i∈Iu
(ˆ πi − π ˆi+d ) = 0
∀u.
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
767
Effective Investments
Effective Investment
0.07 0.06
Wheat
0.05
Oil
0.04 0.03 0.02 0.01 0 −0.01 −0.02 −0.03 2005
2006
2007 Year
2008
2009
Figure 8. Effective investments per commodity as proportion of wealth; K = 2, ε2 = 1. Data set: Prices of oil and wheat futures from 2005–2008.
Thus the notional neutral version of the problem is (4.10) 2d 0 Id T T π ˆ → min , with π ˆ μ ˆ ≥ C, π ˆi = 1, π ˆi ≥ 0 ∀i, (ˆ πi − π ˆi+d ) = 0. Σ+λ π ˆ Id 0 π ˆ ∈R2d i=1
i∈Iu
The additional constraint was used for all algorithms (except the n1 -portfolio). As before, the realized return for each algorithm falls short of the mean target return (see Figure 16), but the presented cluster-based method produces the result with the least risk (see Figures 17, 18, and 19) and, for this data, even the best return rate, next to the n1 -portfolio, where no risk minimization is done. 5. Conclusion. We adapted and expanded a method for simultaneous clustering, dimension reduction, and metastability analysis of multidimensional time series to financial data. A main advantage of the method (compared to standard HMM/GMM methods of phase identi-
768
L. PUTZIG, D. BECHERER, AND I. HORENKO
Wheat, Phase 2 rel. Notionals
rel. Notionals
Wheat, Phase 1 0.5 0 −0.5
0 −0.5
1
2 3 Maturities Oil, Phase 1
4
1
2 3 Maturities Oil, Phase 2
4
1
2 3 Maturities
4
0.5 rel. Notionals
0.5 rel. Notionals
0.5
0
−0.5
0
−0.5 1
2 3 Maturities
4
Figure 9. Spread of the relative notionals per commodity and phase; K = 2, ε2 = 1. Data set: Prices of oil and wheat futures from 2005–2008.
fication [13, 23]) is that no a priori probabilistic assumptions about the hidden and observed market processes are necessary in the context of the presented method. Furthermore, section 4 indicates that the method is suitable for finding a low-dimensional representation of the data and estimating mean and variance adaptively while detecting market phases, delivering information that can be further used for portfolio optimization. The presented framework allows one to make use of fast and numerically robust FEM solvers (developed for numerical solution of partial differential equations) in a new context of computational time series analysis. The general feasibility of identification of market phases was demonstrated, and it was shown that an expansion of classical portfolio theory utilizing market phase detection can be used to further decrease the intrinsic risk. Additionally, the behavior of the change point detection when applied online to a data stream was demonstrated to be robust enough to qualify for practical purposes. As a standard stationary risk measure, the standard deviation of portfolio returns was also used to compare different portfolio optimization strategies on real
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
769
Total return evolution of portfolios 1.5 1.4
cluster−based FlexM Markowitz 1/n − Portfolio
Performance
1.3 1.2 1.1 1 0.9 0.8
2006
2007 Year
2008
2009
Figure 10. Value evolution of different portfolio-optimization algorithms.
financial data. It was demonstrated that application of the presented numerical scheme based on the identification of market phases can decrease the risk of resulting portfolios in terms of this risk measure. Strictly speaking, the empirical standard deviation of portfolio returns is not an equitable risk measure, in that it assumes the underlying data to be stationary. If we assume the underlying market process to be nonstationary (as is done in the context of the presented numerical approach), the evaluation and comparison of risks should be done in a measure that reflects this. Numerical studies of robust portfolio optimization methods and empirical quantification of risk by methods that do not rely on stationarity assumptions are a matter of future research. Having demonstrated the applicability of the presented computational scheme to a realistic financial data example, a more extensive online data study is clearly needed for validation. Also, transaction costs should be included and different risk measures should be taken into account. In contrast to climatological [14, 17, 15, 18] or biophysical applications [19], repeating phases seem to be rare in financial time series, and additional time weighting (e.g., exponential decay) should be further investigated.
770
L. PUTZIG, D. BECHERER, AND I. HORENKO
Risk (intrinsic) evolution of portfolios
0
10
−2
Risk (intrinsic)
10
−4
10
−6
10
−8
10
Cluster−based FlexM Markowitz 1/n−Portfolio
−10
10
2006
2007 Year
2008
2009
Figure 11. Intrinsic risk (see (4.4)) evolution of different portfolio-optimization algorithms.
Risk (standard deviation) evolution of portfolios
standard deviation of portfolio returns
0.025
0.02
cluster−based
0.015
FlexM Markowitz 1/n − Portfolio
0.01
0.005
0
2006
2007 Year
2008
2009
Figure 12. Risk (empirical standard deviation) evolution of different portfolio-optimization algorithms.
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
771
Average deviation from target return rate 0.025 cluster−based FlexM Markowitz 1/n−Portfolio
Deviation
0.02
0.015
0.01
0.005
0
2006
2007 Year
2008
2009
Figure 13. Risk (deviation from target mean return) evolution of different portfolio-optimization algorithms.
Table 1 Rate of return and different risk measures. Algorithm Cluster-based FlexM Markowitz 1 -portfolio n
Rate of return 4.01% −0.55% 0.30% 5.74%
Averaged risk 1.4164 7.5113 1.5180 3.7022
intrinsic 10−3 10−3 10−3 10−3
Standard of returns 2.4613 7.8968 3.9324 6.1033
deviation 10−3 10−3 10−3 10−3
Deviation from target mean return 2.4619 10−3 7.8935 10−3 3.9325 10−3 6.1003 10−3
772
L. PUTZIG, D. BECHERER, AND I. HORENKO
Total return of cluster−based and FlexM portfolio 1.1
Realized total return
1.08 1.06 cluster−based 1.04
FlexM
1.02 1 0.98 0.96 0
0.5
1 1.5 2 2.5 3 Target Return Rate in % per year
3.5
Figure 14. Total return depending on the yearly target return rate C in (3.5).
4
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
−3
standard deviation of Portfolio returns
9
x 10
8
773
Risk (standard deviation) of different portfolios
cluster−based FlexM
7 6 5 4 3 2 1 0 0
0.5
1 1.5 2 2.5 3 Target Return Rate in % per year
3.5
4
Figure 15. Risk (empirical standard deviation) depending on the yearly target return rate C in (3.5).
774
L. PUTZIG, D. BECHERER, AND I. HORENKO
Value evolution of notional neutral portfolios 1.1 1.05 1
Total return
0.95 0.9 0.85 0.8 0.75
cluster−based FlexM
0.7
Markowitz 0.65
1/n−Portfolio 2006
2007 Year
2008
2009
Figure 16. Value evolution of different notional neutral portfolio-optimization algorithms.
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
0
775
Risk (intrinsic) evolution of portfolios
10
−2
10
Risk (intrinsic)
−4
10
−6
10
−8
10
Cluster−based FlexM Markowitz 1/n−Portfolio
−10
10
−12
10
2006
2007 Year
2008
2009
Figure 17. Intrinsic risk (4.4) evolution of different notional neutral portfolio-optimization algorithms.
776
L. PUTZIG, D. BECHERER, AND I. HORENKO
Risk (standard deviation) evolution of notional neutral portfolios 0.01
standard deviation of portfolio returns
0.009 0.008
cluster−based
0.007
FlexM Markowitz
0.006
1/n−Portfolio
0.005 0.004 0.003 0.002 0.001 0
2006
2007 Year
2008
2009
Figure 18. Risk (empirical standard deviation) evolution of different notional neutral portfolio-optimization algorithms.
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
777
average deviation from target mean return of notional neutral portfolios 0.01
deviation from target mean returns
0.009 0.008
cluster−based
0.007
FlexM Markowitz
0.006
1/n−Portfolio
0.005 0.004 0.003 0.002 0.001 0
2006
2007 Year
2008
2009
Figure 19. Risk (deviation from target mean return) evolution of different notional neutral portfoliooptimization algorithms.
778
L. PUTZIG, D. BECHERER, AND I. HORENKO REFERENCES
[1] A. Akansu and R. Haddad, Multiresolution Signal Decomposition: Transforms, Subbands, and Wavelets, Academic Press, San Diego, CA, 1992. [2] T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, J. Econometrics, 31 (1986), pp. 307–327. ¨ [3] U. C ¸ elikyurt and S. Ozekici, Multiperiod portfolio optimization models in stochastic markets using the mean-variance approach, European J. Oper. Res., 179 (2007), pp. 186–202. [4] V. DeMiguel and F. Nogales, Portfolio selection with robust estimation, Oper. Res., 57 (2009), pp. 560–577. [5] D. Duffie, Futures Markets, Prentice–Hall, Englewood Cliffs, NJ, 1989. [6] R. Engle, Autoregressive conditional heteroscedasticity with estimates of variance of United Kingdom inflation, Econometrica, 50 (1982), pp. 987–1008. [7] F. Fabozzi, P. Kolm, D. Pachamanova, and S. Focardi, Robust Portfolio Optimization and Management, John Wiley & Sons, New York, 2007. [8] G. Frahm and U. Jaekel, Tyler’s M-estimator, random matrix theory, and generalized elliptical distributions with applications to finance, Phys. A, submitted. [9] H. Fu, M. K. Ng, M. Nikolova, and J. L. Barlow, Efficient minimization methods of mixed l2-l1 and l1-l1 norms for image restoration, SIAM J. Sci. Comput., 27 (2006), pp. 1881–1902. [10] L. Garlappi, R. Uppal, and T. Wang, Portfolio selection with parameter and model uncertainty: A multi-prior approach, Rev. Financ. Stud., 20 (2007), pp. 41–81. [11] D. Goldfarb and G. Iyengar, Robust portfolio selection problems, Math. Oper. Res., 28 (2003), pp. 1–38. [12] J. Hadamard, Sur les probl`emes aux d´eriv´ees partielles et leur signification physique, Bull. Univ. Princeton 13, Princeton University, Princeton, NJ, 1902, pp. 49–52. [13] J. Hamilton, A new approach to the economic analysis of nonstationary time series and the business cycle, Econometrica, 57 (1989), pp. 357–384. [14] I. Horenko, On simultaneous data-based dimension reduction and hidden phase identification, J. Atmospheric Sci., 65 (2008), pp. 1941–1954. [15] I. Horenko, On robust estimation of low-frequency variability trends in discrete Markovian sequences of atmospheric circulation patterns, J. Atmospheric Sci., 66 (2009), pp. 2059–2072. [16] I. Horenko, Finite element approach to clustering of multidimensional time series, SIAM J. Sci. Comput., 32 (2010), pp. 62–83. [17] I. Horenko, On clustering of non-stationary meteorological time series, Dyn. Atmos. Oceans, 49 (2010), pp. 164–187. ¨ tte, Automated generation of reduced stochastic [18] I. Horenko, R. Klein, S. Dolaptchiev, and Ch. Schu weather models I: Simultaneous dimension and model reduction for time series analysis, Multiscale Model. Simul., 6 (2008), pp. 1125–1145. ¨ tte, Likelihood-based estimation of multidimensional Langevin models and [19] I. Horenko and Ch. Schu its application to biomolecular dynamics, Multiscale Model. Simul., 7 (2008), pp. 731–773. [20] I. Jolliffe, Principal Component Analysis, Springer, New York, 2002. [21] R. Kalman, A new approach to linear filtering and prediction problems, Trans. ASME J. Basic Eng., 82 (1960), pp. 35–45. [22] B. Kedem and K. Fokianos, Regression Models for Time Series Analysis, Wiley Ser. Probab. Stat., John Wiley & Sons, New York, 2002. [23] H. Krolzig, Predicting Markov-switching vector autoregressive processes, J. Forecasting, to appear. [24] O. Ledoit, P. Santa-Clara, and M. Wolf, Flexible multivariate GARCH modeling with an application to international stock markets, Rev. Econ. Statist., 3 (2003), pp. 735–747. [25] O. Ledoit and M. Wolf, Improved estimation of the covariance matrix of stock returns with an application to portfolio selection, J. Empirical Finance, 10 (2003), pp. 603–621. [26] O. Ledoit and M. Wolf, Honey, I shrunk the sample covariance matrix, J. Portfolio Management, 30 (2004), pp. 110–119. [27] H. Markowitz, Portfolio selection, J. Finance, 7 (1952), pp. 77–91. [28] R. Michaud, The Markowitz optimization enigma: Is optimized optimal?, Financial Analysts Journal, 45 (1989), pp. 31–42.
FUTURES PORTFOLIO AND MARKET PHASE DETECTION
779
[29] T. Rolski, H. Schmidli, V. Schmidt, and J. Teugels, Stochastic Processes for Insurance and Finance, Wiley Ser. Probab. Stat., John Wiley & Sons, Chichester, 1999. ¨ fer and K. Strimmer, A shrinkage approach to large-scale covariance matrix estimation and [30] J. Scha implications for functional genomics, Stat. Appl. Genet. Mol. Biol., 4 (2005), article 32. [31] W. Sharpe, Capital asset prices: A theory of market equilibrium under conditions of risk, J. Finance, 9 (1964), pp. 425–442. [32] A. Tikhonov, On the stability of inverse problems, Dokl. Akad. Nauk SSSR, 39 (1943), pp. 195–198. [33] R. S. Tsay, Analysis of Financial Time Series, John Wiley & Sons, New York, 2005. [34] D. E. Tyler, A distribution-free M-estimator of multivariate scatter, Ann. Statist., 15 (1987), pp. 223– 251. [35] R. Welsch and X. Zhou, Application of robust statistics to asset allocation models, REVSTAT, 5 (2007), pp. 97–114.