Optimal Allocation of a Futures Portfolio Utilizing ... - Semantic Scholar

Comment

Report 2 Downloads 43 Views

c 2010 Society for Industrial and Applied Mathematics

SIAM J. FINANCIAL MATH. Vol. 1, pp. 752–779

Optimal Allocation of a Futures Portfolio Utilizing Numerical Market Phase Detection∗ L. Putzig†, D. Becherer‡, and I. Horenko§ Abstract. This paper presents an application of the recently developed method for simultaneous dimension reduction and metastability analysis of high-dimensional time series in the context of computational ﬁnance. Further extensions are included to combine state-speciﬁc principal component analysis (PCA) and state-speciﬁc regressive trend models to handle the high-dimensional, nonstationary data. The identiﬁcation of market phases allows one to control the involved phase-speciﬁc risk for futures portfolios. The numerical optimization strategy for futures portfolios based on Tikhonovtype regularization is presented. The application of proposed strategies to online detection of the market phases is exempliﬁed ﬁrst on the simulated data and then on historical futures prices for oil and wheat from 2005–2008. Numerical tests demonstrate the comparison of the presented methods with existing approaches. Key words. economic time series analysis, portfolio optimization, investment, regression, market phases, clustering AMS subject classifications. 62M10, 91B28, 91B84 DOI. 10.1137/090754029

1. Introduction. In today’s ﬁnancial industry, computational ﬁnance is used for a wide ﬁeld of applications. Throughout this paper, we will concentrate on the portfolio optimization problem, where, for return maximization and risk minimization problems, the evolution of security returns is estimated with a variety of diﬀerent methods, in order to ﬁnd an allocation that optimizes a suitable performance criterion, while limiting the risk [27, 31]. For practical reasons, computational methods have to address the following points: 1. Time series analysis. Market dynamics are most likely not reﬂected by a stationary process, but involve (hidden) market phases, such as economic cycles, harvest cycles (for grain, fruits, and other kinds of plants), and yearly holidays, and many more aspects inﬂuence the markets. This poses fundamental problems for estimation: To obtain sensible estimates for mean returns even in stationary linear models, hundreds of years of data would be needed. In shorter intervals, estimation risk becomes signiﬁcant. ∗

Received by the editors June 15, 2009; accepted for publication (in revised form) July 3, 2010; published electronically October 21, 2010. This work was supported by the DFG reseach center Matheon “Mathematics for key technologies” in Berlin. http://www.siam.org/journals/siﬁn/1/75402.html † Corresponding author. Institute of Computational Science, University of Lugano, Via Giuseppe Buﬃ 13, CH6904 Lugano, Switzerland ([email protected]). ‡ Institute of Mathematics, Humboldt-University Berlin, Rudower Chaussee 25, 12489 Berlin, Germany (becherer@ mathematik.hu-berlin.de). § Institute of Computational Science, University of Lugano, Via Giuseppe Buﬃ 13, CH-6904 Lugano, Switzerland ([email protected]). 752

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

753

2. Sensitivity analysis. Approaches should be robust w.r.t. model parameter uncertainty. This is a challenge for identiﬁcation when estimating time-varying parameters. 3. High dimensionality. Financial data are high dimensional, as thousands of securities are available from diﬀerent asset classes (stocks, futures, etc.), and data are often available at high frequencies with nonequidistant time stepping (due to weekends, market closures, holidays, etc.); e.g., the Wilshire 5000 consists of 5000 stocks. Hence, computational ability to handle high numbers of assets (stocks, futures, etc.) is relevant. Aspects 2 and 3 cause a high uncertainty of estimates, whereas nonstationarity as it relates to aspect 1 (in observable dimensions, at least) means that estimates should involve timedependent functions, which could not be estimated in a nontrivial way, which is a well-known problem. In 1989 Hamilton [13] suggested a numerical method for identiﬁcation of piecewise linear functions, called hidden market phases, assuming stationarity and Gaussianity of returns. While Gaussianity is still a commonly used assumption by practitioners, many modern models soften this by using, e.g., heavy tails [29]. Later this was extended to multidimensional data using linear vector autoregressive (VAR) models [23]. Still, standard phase-identiﬁcation techniques based on the ﬁltering approach (like wavelet-based spectral methods; see [1]) have in general infeasible numerical complexity in high dimensions; other methods (e.g., Kalman ﬁlter [21], VAR [23], (G)ARCH [6, 2]) require stronger assumptions about the underlying dynamics or the methods (as in [3]) are based on perfect observability of the Markov process, which cannot be taken for granted in reality. As demonstrated by the actual ﬁnancial crises, nonrobust reliance on simpliﬁed assumptions and ignorance of model risk can (among other factors) lead to misperception of risk and possibly to a total loss of the investment. Therefore, there is a need for a wider class of methods that are computationally capable of dealing with realistic amounts of data, monitoring the evolution of risks, and optimizing the risk-return proﬁle of investments. While recent publications concentrate on robustness by using worstcase scenarios (see, e.g., [10, 7, 11]) or penalty functions (see, e.g., [35]), or simply assuming vanishing mean returns, this paper aims at an estimation of nonstationary trends and volatilities directly from prices of futures contracts. Futures markets constitute broad and liquid markets for several asset classes [5]. Leverage can be quite large for futures if the amplitude of futures price variations is measured relative to the size of the margin requirement. But this also induces a high risk. The minimization of this risk is one of the central points of this paper. The recently developed framework for numerical treatment of aspect 1 of the above list of problems is presented and used for clustering and analysis of nonstationary ﬁnancial data [16, 17, 14, 15]. An extension of the framework utilizing linear regression is introduced, allowing for simultaneous market phase identiﬁcation, dimension reduction, and estimation of the phase-speciﬁc trends in the ﬁnancial context. The ability of the extended framework to identify change points online is demonstrated, both on simulated and real market data. Finally, the numerical optimization strategy for phase-speciﬁc selection of futures portfolios is presented and exempliﬁed on a real 8-dimensional data set of futures data for wheat and oil. This paper is organized as follows. In section 2, the FEM-based clustering framework is introduced and its extension towards the ﬁnancial data context is presented. In section 3, we brieﬂy consider the deﬁnition of returns and adapt this to futures markets, as the naive deﬁnition does not ﬁt in this context. The numerical futures portfolio optimization approach is presented. Section 4 demonstrates the application of the proposed numerical framework

754

L. PUTZIG, D. BECHERER, AND I. HORENKO

to simulated and real data and oﬀers a comparison to standard approaches (like the classical methods proposed by Markowitz [27], the ﬂexible multivariate GARCH(1,1) model [24], and a simple n1 -portfolio). In the end, a practical side constraint is considered, the notion of neutrality. 2. Finding market phases. In the following, we will brieﬂy describe the clustering framework ﬁrst introduced in [16, 17, 14, 15]. Special emphasis will be placed on the numerical aspects and the interpretation of the framework in the context of high-dimensional ﬁnancial applications. 2.1. The clustering problem. Let St : [0, T ] → [0, ∞)d be the observed d-dimensional series of asset prices. We look for K models characterized by K distinct sets of a priori unknown model parameters θ1 , . . . , θK ∈ Ω, where Ω is the parameter space. Let g(St , θi ) : Rd × Ω → [0, ∞) be some distance function of the observation St and the set of parameters θi . For the work done throughout this paper, the θi correspond to the changing market phases. Then the vector Γt = (γ1 (t), . . . , γK (t)), called the aﬃliation vector, gives the probability of being in the ith market phase at time t. For a given observation St we want to solve the clustering problem K

(2.1)

γi (t)g(St , θi ) → min Γt ,Θ

i=1

for Θ = (θ1 , . . . , θK ). Moreover, as the γi (t) shall be probabilities, they are subject to the constraint (2.2)

K

γi (t) = 1 ∀t ∈ [0, T ],

i=1

(2.3)

γi (t) ≥ 0 ∀t ∈ [0, T ], i = 1, . . . , K.

As we do not want to solve this problem for a speciﬁc time t only, but for the whole time interval [0, T ], we introduce the averaged clustering functional L: (2.4)

L(Θ, Γ) = 0

K T i=1

γi (t)g(St , θi ) → min, Γ,Θ

subject to the constraints (2.2)–(2.3).1 As was demonstrated in [16, 17, 14, 15] there are practical diﬃculties in solving problem (2.4) numerically. 1. The problem is inﬁnite dimensional since γi (t) is an element of some (not yet speciﬁed) function space. 1

If the Θi were known or ﬁxed to some value, the minimization problem (2.4) would be solved analytically w.r.t. Γ by 1, i = arg min g(St , Θi ), γi (t) = 0 else.

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

755

2. The problem is ill-posed in the deﬁnition of Hadamard [12], as neither the uniqueness of the solution nor its continuity w.r.t. the time series data is fulﬁlled in general. Because of the high number of parameters, additional regularity assumptions will be needed to identify the parameters, and to avoid problems like overﬁtting, underspecifying, and unstable estimates. 3. As long as g is nonconvex, only locally optimal solutions can be found. Problems 1 and 2 are addressed within sections 2.2 and 2.3; how to evade problem 3 is shown in sections 2.4 and 2.5. 2.2. Persistent market phases. As shown in [16, 17, 14, 15], one of the possibilities for overcoming the above-mentioned problem is to impose some additional assumptions on Γ. As we look for hidden market phases, we do assume the process switching between phases is slow: When the market switches to a new state, it will stay there for some time. This suggests a new constraint on Γ: T (∂t γi (t))2 dt ≤ γiε , i = 1, . . . , K, γiε ∈ (0, ∞). (2.5) 0

While (2.5) should reduce the transitions between diﬀerent states, it is hard to handle this inside the optimization. Instead, we thus expand our problem to the regularized form as suggested by Tikhonov for linear least-squares problems [32]: (2.6)

Lε (Θ, Γ) = L(Θ, Γ) + ε2

K i=1

T 0

(∂t γi (t))2 dt.

As was shown in [16], the parameter ε2 can be used to control the persistence of the cluster states, by introducing a cost for short-term switching into another market phase. 2.3. Finite elements and discretization. Discrete time is, from a data point of view, not really an additional assumption, as market data, even if given intraday, is ticked data; thus it is discrete by nature. Time continuity of the data is a modeling idealization. This limits the function space from which Γ is chosen, addressing problems 1 and 2. Thus let T be a ﬁnite set of ﬁnite intervals on [0, T ]: T = {0 = t1 < t2 < · · · < tN = T }.

(2.7)

Then we can choose a ﬁnite element basis {v1 (t), . . . , vN (t)} with (2.8)

v1 (t) = 0 for t ∈ (t1 , t2 ),

v1 (t) = 0 for t ∈ / (t1 , t2 ),

(2.9)

vk (t) = 0 for t ∈ (tk−1 , tk+1 ),

vk (t) = 0 for t ∈ / (tk−1 , tk+1 ),

(2.10)

vN (t) = 0 for t ∈ (tN −1 , tN ),

vN (t) = 0 for t ∈ / (tN −1 , tN ).

This allows us to write γi as (2.11)

γi =

N k=1

γˆik vk + δiN ,

k = 2, . . . , N − 1,

756

L. PUTZIG, D. BECHERER, AND I. HORENKO

where δN = max{δiN , i = 1, . . . , K} is the discretization error. Now (2.6) could be written as N 2 T K T K N ε k 2 k (2.12) L (Θ, Γ) = γˆi vk (t)g(St , θi ) dt + ε γˆi ∂t vk (t) dt + O(δN ). 0

i=1 k=1

i=1

Using the local support of vk , we can deﬁne t2 v1 (t)g(St , θi ) dt, . . . , (2.13) α(θi ) = t1

and

(2.14)

⎛

t2

2 t1 (∂t v1 ) (t) dt

⎜ t2 H=⎜ ⎝ t1 ∂t v1 (t)∂t v2 (t) dt 0

t2

t1

k=1

tN tN−1

vN (t)g(St , θi ) dt

∂t v1 (t)∂t v2 (t) dt

t3

t1

0

(∂t v2 )2 (t) dt .. .

t3 t2

0

...

⎞

⎟ ∂t v2 (t)∂t v3 (t) dt . . . ⎟ ⎠ .. .. . .

to create the discrete clustering problem (see, for example, [16]) (2.15)

ε (Θ, Γ) = L

K α(θi )T γˆi + ε2 γˆiT H γˆi → min . Γ,Θ

i=1

Please note that up to now all assumptions made concern the smoothness of the clustering aﬃliation γi (t) (see condition (2.5)) only and no probabilistic or statistical assumptions on the data xt are made. This abdication of including a priori assumptions on probabilistic properties of the hidden and observable processes is a major feature of the presented method. 2.4. The distance function. When working with ﬁnancial data, one might end up with thousands of assets; e.g., the Wilshire 5000 index consists of 5000 stocks. W.r.t. this high dimensionality, one has to think about methods able to handle these data. The popular idea is to use the principal component analysis (PCA; see [33, 20]), where only a linear lowdimensional manifold of the data space is used to represent the original. The drawback here is the linearity of the manifold, as a globally optimal linear subspace might not exist. Thus let Ti ∈ Rd×n for i = 1, . . . , K, let n < d with TiT Ti = Id be the projection matrices, and let xt ∈ Rd be some time series deduced from St . Then we can deﬁne the distance function by (2.16)

g(xt , θi ) = (xt − μi ) − Ti TiT (xt − μi ) 22 ,

θi = (μi , Ti ).

This distance function measures the reduction error resulting from the projection of the d-dimensional data on the n-dimensional linear manifold i. The xt could be the returns and Ti could be interpreted as the direction of maximal volatility of x, while μi describes the mean behavior [18]. We can now try to get better results by using the time-dependent function μi (t) instead of a constant one, e.g., approximating μi (t) with some regression model as in [17]. Therefore let ϕi (t) be some basis of the functional space (e.g., monomials); then the distance function changes to 2 ω ω j j T μi ϕj (t) −Ti Ti xt − μi ϕj (t) , θi = ((μ1i , . . . , μki ), Ti ), (2.17) g(xt , θi ) = xt − j=1

j=1

2

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

757

where ω is the order of the trend. In fact, we can see ωj=1 μji ϕj (t) as the smoothed trend, so this analysis can be interpreted as a detrending. Additionally we can estimate the covariance matrix of a clustered set (xt ) of the time series for some i by (2.18)

¯ ΣTi

=

1 t≤T¯

γi (t)

γi (t) xt −

t≤T¯

ω

μji ϕj (t)

xt −

j=1

ω

T μji ϕj (t)

.

j=1

From a practical viewpoint, one should include an additional weighting further connecting the assets to their liquidity. However, the inﬂuence of liquidity on risk is not part of our analysis. Please note that the combination of PCA, trend estimation, and clustering is not just a direct chaining of both algorithms. The estimation of trends and covariance matrices is embedded in the minimization problem (2.15). 2.5. Calculating the parameters. Given the analyzed time series xt and a cluster aﬃliation Γ, we can now compute the cluster parameters μji and Ti . Therefore we have to solve for each i = 1, . . . , K the problem α(θi )T γˆi + ε2 γˆiT H γˆi → min

(2.19)

θi

⇔

(2.20)

T

α(θi ) γˆi

→ min . θi

Using the deﬁnition of α and g we get 2 tk+1 N ω ω j j k T γi vk (t) xt − μi ϕj (t) − Ti Ti xt − μi ϕj (t) dt → min, (2.21) μi ,Ti tk−1 j=1

k=1

j=1

2

with t0 = t1 and tN +1 = tN . First-order conditions of optimality imply that partial derivatives w.r.t. μji and Ti vanish; hence for μ we obtain that 0=

N

γik

k=1

(2.22)

−

ω j=1

tk+1 tk−1

μji

N k=1

vk (t)(Ti TiT ϕl (t) − ϕl (t))T (xt − Ti TiT xt ) dt γik

tk+1 tk−1

vk (t)(Ti TiT ϕl (t) − ϕl (t))T (ϕj (t) − Ti TiT ϕj (t)) dt

holds, and an analogous equation is obtained for Ti . The latter yields that Ti consists of the eigenvectors corresponding to the n largest eigenvalues of the empirical covariance matrix of the values associated with state i. Now the problem (2.15) could be solved by subspace iteration; thus, starting with some arbitrary Γ0 , we solve iteratively (2.23)

ε (Θ, Γk ) Θk = arg min L Θ

and (2.24)

ε (Θk , Γ) Γk+1 = arg min L Γ

758

L. PUTZIG, D. BECHERER, AND I. HORENKO

for k = 0, 1, . . . . Although this algorithm is not guaranteed to converge to the globally optimum, at least a local minimum will be found. By performing this algorithm several times with diﬀerent starting values Γ0 , we can approximate the global optimal parameters. This stepwise optimization allows us to exploit the structure of the problem. Separating these two steps allows us to split the overall optimization into two numerically conceivable steps performed subsequently until the overall optimization threshold is reached. These two steps are (i) a problem that is (in the PCA case) analytically solvable (2.23) and results in the eﬃcient numerical solution of the eigenvalue problem; and (ii) a convex and sparse quadratic programming problem ((2.24) with (2.2)–(2.3)) [16]. Because of that, the proposed numerical scheme is numerically cheaper and more robust for high-dimensional applications than the standard “out of the box” optimization methods applied to the same problem. 3. Adapting portfolio optimization theory to futures. The usual deﬁnition of returns S −St cannot be naively applied to futures data, as the price for a contract is not St . Rt = t+Δt St Instead, no price is charged for entering a futures contract, but an initial amount is paid into the margin account that will hold the accrued gains and losses. So, given an initial margin of I ∈ Rd and a portfolio π in proportion of wealth, we deﬁne returns for futures over the period t to t + Δt as Rt = sgn(πt )

(3.1)

St+Δt − St I

(componentwise).

In contrast to portfolio optimization approaches for the stock market, it is an important feature of futures that long and short positions can be held equally well. But since the initial margin is dependent only on the absolute contract size, our portfolio constraints will change, and the absolute weights of the components should sum up to one: d

(3.2)

|πi (t)| = 1 ∀t ∈ [0, T ].

i=1

Given suitable estimates Σ, μ for covariances and means of returns, the portfolio problem is posed as π T Σπ → min,

(3.3)

π

with π 1 = 1, π T μ ≥ C,

for a chosen target mean return C. Since this is not yet (as in classical mean variance optimization) a quadratic minimization problem with linear (or quadratic) constraints, we use a splitting into long positions and short positions, similar to the idea explained in [9] in the context of image processing, to get an extended parameter set Σ −Σ long short T = ) , Σ , μ ˆ = (μ, −μ)T , (3.4) π ˆ = (π , π −Σ Σ and we apply a penalty term. We deﬁne the extended problem by (3.5)

π ˆ

T

+λ Σ

0 Id Id 0

π ˆ → min , π ˆ ∈R2d

with π ˆ μ ˆ ≥ C, T

2d i=1

π ˆi = 1, π ˆi ≥ 0 ∀i

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

759

with the regularization parameter λ > 0. Please note that for suﬃciently large λ the part 0 Id T (3.6) λˆ π π ˆ = 2λ(π long )T π short Id 0 and the condition π ˆi ≥ 0 ∀i guarantee that for each asset the position is either long or short, but not both at the same time. Simultaneously, the remainder of the extended problem is just (3.7)

π = (π long − π short )T Σπ long − (π long − π short )T Σπ short = πΣπ π ˆ T Σˆ

and (3.8)

ˆ = μT π long − μT π short = μT π. μ ˆT π

4. Numerical examples. Throughout the ﬁrst part of this section, we will make use of a simulated time series. These time series have a length of 1000 with a change point at time tC = 500. All of the data points are distributed independently according to N (0, Σ), where ⎧ I2 , t ≤ 500, ⎨ 0.85 0.4 α(t), α(t) = (4.1) Σ(t) = αT (t) cos(ρ) − sin(ρ) 0.4 0.2 ⎩ , t > 500. sin(ρ) cos(ρ) As one can see, after the change point at tC = 500, the covariance matrix is just rotated by some rotation angle ρ. 4.1. Covariance for small samples. from the literature, for small sample sizes, kAs known T is not robust enough; see, e.g., [30, 4]. This x x the maximum likelihood estimator k1 tt=t t t 1 will lead to numerical biassing when it comes to portfolio optimization [28, 25, 26]. We will use the Tyler M-estimator (see, e.g., [34, 8]) instead, which is deﬁned by (4.2)

tk d xt xTt k ˆ . Σ = ˆ k )−1 xt k t=t xTt (Σ 1

ˆ k to (4.2) is unique only up to a multiplicative constant, this is not Although the solution Σ a problem for the clustering method (2.15), as neither the result of the minimization nor the eigenvectors (for PCA) is dependent on this constant. For Figure 1, the ﬁrst 100 points of 100 time series were used (using the distribution stated above) and for every time step t ∈ {2, . . . , 100} the maximum likelihood estimator and the Tyler M-estimator were calculated. ˆ − Σ) was used as a distance measure, and Then the largest singular value of the diﬀerence (Σ the average and standard deviation were calculated and plotted. As one can see from this example, the standard deviation of the Tyler M-estimator is much smaller than the one for the maximum likelihood estimator (although both converge to the same limit). 4.2. Number of clusters. In realistic applications, the optimal number of metastable clusters is usually a priori unknown. Nevertheless, one can ﬁnd out the number of statistically distinguishable clusters (see [16]). Therefore we start the clustering process with a rather high number of clusters, estimate the cluster parameters, and get the error bounds for these

L. PUTZIG, D. BECHERER, AND I. HORENKO

relative error (Max. Likelihood)

760

2 1.5 1 0.5 0 0

20

40 60 Length of the time series

80

100

20

40 60 Length of the time series

80

100

relative error (Tyler−M)

1 0.5 0 −0.5 0

Figure 1. Robustness of the maximum likelihood estimator and the Tyler M-estimator for short normally distributed time series. Parameters as in (4.1).

parameters by bootstrapping the elements of the cluster. If the parameter conﬁdence sets of diﬀerent clusters overlap, the statistical diﬀerence between those clusters is not high enough; thus the number of clusters should be reduced and the process should be started over again [22]. Using this technique for diﬀerent ε leads to a picture like Figure 2. The decrease in the number of clusters is due to the fact that for the increasing ε the algorithm favors metastable clusters and merges the rapidly mixing ones together. Thus, while increasing ε, similar clusters will get merged into one. The plateau results from the high diﬀerence between the last two remaining clusters. For a real data set we will observe similar behavior when we look for the number of clusters. 4.3. Detecting a change point. Another important problem is to ﬁnd the actual position of the change points. The number of detected change points depends on ε as one can see in Figure 3. However, a high number of change points usually indicates some rapidly mixing clusters that could be united to a metastable cluster (e.g., using half the data in Figure 3). Decreasing the number of clusters and increasing ε might help to solve this problem. On the other hand, a high value for ε will lead to a longer delay between occurrence and online detection of the change point; this can be seen in Figure 4. The same problem appears when the rotation angle becomes too small; see Figure 5.

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

761

2

Number of clusters depending on ε

Statistically different clusters

3

2

1

−20

10

−10

10

0

ε2

10

10

10

Figure 2. Number of distinguishable clusters depending on ε2 . Parameters as in (4.1); ρ = 90◦ .

4.4. Looking at real data. For the calculations in this paper, we used the futures prices from 2005–2008 for wheat and oil futures (four times to maturity each, quarterly for wheat and monthly for oil, daily closings) from http://www.kcbt.com/historical data.asp (Kansas City Board of Trade) and http://tonto.eia.doe.gov/dnav/pet/pet pri fut s1 d.htm (U.S. Department of Energy). For both commodities the four futures with the lowest time to maturity were taken into account. Applying the introduced clustering algorithm to these data gives the result shown in Figure 6. Implied by this picture, an optimal number of metastable clusters to use should be two. Then the cluster allocation for the whole time series (see Figure 7) produces a kind of seasonal cycle that could be due to a harvest or heating period. The phases diﬀer in the following points: • The phase shown in Figure 7, and thus phase 1, is characterized by lesser overall trend in the price than phase 2. • The general direction of the maximum volatility (described by the projection operator Ti from (2.15)) in both phases is similar; the ﬁrst eigenvectors of the covariance matrices are nearly equal, and the second eigenvectors span an angle of approximately 5 degrees.

762

L. PUTZIG, D. BECHERER, AND I. HORENKO Number of change points depending on ε2 150 First 500 datapoints 100 50 0 −20 10

−15

10

−10

−5

10

10 2 ε

0

10

5

10

10

10

Complete dataset 1

0 −20 10

−15

10

−10

−5

10

10

0

10

5

10

10

10

Figure 3. Number of detected change points depending on ε2 for half of the (artiﬁcial) data and the complete data set. Parameters as in (4.1); K = 2, ρ = 90◦ .

The eﬀective investment for each commodity (as the sum of the portfolio weights over the diﬀerent maturities) for the phase-dependent portfolio can be seen in Figure 8. The variance of the positions can be considered as a result of the parameter estimation error. Additionally, Figure 9 shows box-whisker plots of the relative notionals, deﬁned by (4.3)

πi (t) , i∈Iu |πi (t)|

where Iu is the set of indices belonging to one commodity. The online version of the cluster aﬃliations is presented; thus the aﬃliations at time t are calculated “on the ﬂy” using only the information ∀s, s < t during the portfolio optimization. The high variance in the resulting portfolio weights (resp., the relative notionals) is a direct result of nonrobust parameter estimates. For practical purposes, further steps in creating robust estimates or optimizing robustly w.r.t. uncertain estimates should be considered in further research. 4.5. Comparing portfolios. When looking at the interest rate for U.S. treasury bonds, an investment in January 2005 would have given a yearly interest rate of approximately 3.4%. Thus after four years we have a gain of 14.31% (that is a daily rate of 1.415 bps). We use this rate as a target rate C (as in (3.5)). As stated earlier, we compare our algorithm with the Markowitz approach, the FlexM algorithm from [24], and a simple n1 -portfolio, where half the assets are bought and the other half sold. For the cluster-based approach, the basis ϕ with polynomials up to order two was taken. The clustering was done w.r.t. μ and T . The FlexM algorithm provided by the authors of [24] was used utilizing the parameter demean; thus the estimation and subtraction of the mean from the data prior to the portfolio optimization is done by the FlexM function. The n1 -portfolio was built by buying for 18 of the wealth the two

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

763

Relative error in detection of change point

Detection time of change point depending on ε2 0.5

0.4

0.3

0.2

0.1

0 −20

10

−10

0

10

ε2

10

10

10

Figure 4. Relative error in detection time of the single change point depending on ε2 . Parameters as in (4.1); K = 2, ρ = 90◦ .

long maturity futures of each commodity and selling the remaining two for each commodity equally. Additionally, the Markowitz approach was used by estimating mean and covariance from the whole time series up to the actual point. All algorithms were supplied only with information available at time t, so no future knowledge is included. In each case, the portfolio was recalculated and/or rebalanced every 10 business days starting with day 10 of the time series. As one can see in Figure 10, all algorithms fall short of the target 14.31% mean return. While the n1 -portfolio is doing better (in terms of return rate) than the other algorithms, this comes with a higher risk, as one can see in Figure 11. We deﬁne the “intrinsic risk estimate” for portfolio returns from t to t + Δt by (4.4)

Rt =

πtT Cov(Rt )πt ,

where the empirical returns Rt are given by (3.1), and the Cov(Rt ) is an estimate for the covariance matrix of the data (based upon data only up to t), which is given by • a linear combination of the empirical cluster covariance matrices given by (2.18) for the cluster-based algorithm, and thus

764

L. PUTZIG, D. BECHERER, AND I. HORENKO

Relative error in detection of change point

Detection time of change point depending on ρ 0.5

0.4

0.3

0.2

0.1

0 0

20

40 60 ρ in Degree

80

100

Figure 5. Relative error in detection time of the single change point depending on ρ. Parameters as in (4.1); K = 2, ε2 = 1.

(4.5)

Cov(Rt ) =

K

γi (t)Σti ;

i=1

• the GARCH(1,1) estimate for the FlexM algorithm (see [24]); • the standard sample covariance matrix for the Markowitz approach and the n1 -portfolio. Although those intrinsic risks shown in Figure 11 are estimates of the instantaneous variance of the portfolio returns, they are not directly comparable, as the measures diﬀer from each other and the underlying assumptions are diﬀerent, so in Figure 12 risk is meant in the sense of the standard deviation of the daily returns of the portfolios from the empirical mean; the cluster-based approach still produces the strategy with the lowest risk. And even if we use the deviation from the target mean return C k−1 1 t (π T Rt − C)2 (4.6) k − 1 t=t t 1

as a risk measure (see the results in Figure 13), the cluster-based approach reaches the best result. The similarity of this risk measure and the standard deviation is easy to see: If the vari-

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

765

2

Number of clusters depending on ε

Number of clusters

5

4

3

2

1 −20

10

−10

10

0

ε2

10

10

10

Figure 6. Number of distinguishable clusters depending on ε2 . Data set: Prices of oil and wheat futures from 2005–2008. T μ ance minimal portfolio does not satisfy the condition π ˆmin ˆ ≥ C, i.e., the target mean return ˆ = C. is not reached, due to the convexity of the problem the solution of (3.5) will satisfy π ˆT μ This guarantees that for suﬃciently large C the expected mean is C, and thus (4.6) would give the standard deviation. The advantage of (4.6) over the empirical standard deviation is the fact that the deviation from the target mean return does not depend on the empirical expectation of the portfolio returns. Indeed, as stated in Table 1, the empirical standard deviation of the portfolio returns and the empirical deviation from the target mean return are quite close to each other. The cluster-based approach reaches 69.85% of the performance of the n1 -portfolio, while it takes only 40.33% of the risk (in the sense of the empirical standard deviation). Here, rate of return denotes the total return minus one and average intrinsic is deﬁned by

(4.7)

tN−1 1 Rt . N − 1 t=t 1

For diﬀerent target returns the results are similar, although the cluster-based portfolio is less sensible to the target return rate (see Figures 14 and 15).

766

L. PUTZIG, D. BECHERER, AND I. HORENKO

Cluster allocation, Prices of oil and wheat futures 2005−2008 1 0.9 0.8 0.7

γ1(t)

0.6 0.5 0.4 0.3 0.2 0.1 0 2006

2007 Year

2008

2009

Figure 7. Cluster allocation for time series of futures; K = 2, ε2 = 1. Data set: Prices of oil and wheat futures from 2005–2008.

4.6. Adaption for notional neutral investment. A common relevant strategy of portfolio managers is so-called market neutral investment. The aim is to bet only from views on relative changes between asset prices but remain neutral w.r.t. overall market moves. To avoid assumptions about an appropriated market indicator or a market portfolio that characterizes the whole market, a more pragmatic approach is chosen: We deﬁne notional neutral investments as investments where long and short positions are balanced such that equal notionals are invested long and short for each security class. This is done by an additional constraint to ensure that the accrued notional of contracts within each class u (here wheat and oil) is equal to zero:

(4.8)

πi = 0

∀u,

i∈Iu

where Iu is an index set combining all assets for each selected class of futures within which accrued notionals are requested to be zero. For the extended problem, this transforms to (4.9)

i∈Iu

(ˆ πi − π ˆi+d ) = 0

∀u.

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

767

Effective Investments

Effective Investment

0.07 0.06

Wheat

0.05

Oil

0.04 0.03 0.02 0.01 0 −0.01 −0.02 −0.03 2005

2006

2007 Year

2008

2009

Figure 8. Eﬀective investments per commodity as proportion of wealth; K = 2, ε2 = 1. Data set: Prices of oil and wheat futures from 2005–2008.

Thus the notional neutral version of the problem is (4.10) 2d 0 Id T T π ˆ → min , with π ˆ μ ˆ ≥ C, π ˆi = 1, π ˆi ≥ 0 ∀i, (ˆ πi − π ˆi+d ) = 0. Σ+λ π ˆ Id 0 π ˆ ∈R2d i=1

i∈Iu

The additional constraint was used for all algorithms (except the n1 -portfolio). As before, the realized return for each algorithm falls short of the mean target return (see Figure 16), but the presented cluster-based method produces the result with the least risk (see Figures 17, 18, and 19) and, for this data, even the best return rate, next to the n1 -portfolio, where no risk minimization is done. 5. Conclusion. We adapted and expanded a method for simultaneous clustering, dimension reduction, and metastability analysis of multidimensional time series to ﬁnancial data. A main advantage of the method (compared to standard HMM/GMM methods of phase identi-

768

L. PUTZIG, D. BECHERER, AND I. HORENKO

Wheat, Phase 2 rel. Notionals

rel. Notionals

Wheat, Phase 1 0.5 0 −0.5

0 −0.5

1

2 3 Maturities Oil, Phase 1

4

1

2 3 Maturities Oil, Phase 2

4

1

2 3 Maturities

4

0.5 rel. Notionals

0.5 rel. Notionals

0.5

0

−0.5

0

−0.5 1

2 3 Maturities

4

Figure 9. Spread of the relative notionals per commodity and phase; K = 2, ε2 = 1. Data set: Prices of oil and wheat futures from 2005–2008.

ﬁcation [13, 23]) is that no a priori probabilistic assumptions about the hidden and observed market processes are necessary in the context of the presented method. Furthermore, section 4 indicates that the method is suitable for ﬁnding a low-dimensional representation of the data and estimating mean and variance adaptively while detecting market phases, delivering information that can be further used for portfolio optimization. The presented framework allows one to make use of fast and numerically robust FEM solvers (developed for numerical solution of partial diﬀerential equations) in a new context of computational time series analysis. The general feasibility of identiﬁcation of market phases was demonstrated, and it was shown that an expansion of classical portfolio theory utilizing market phase detection can be used to further decrease the intrinsic risk. Additionally, the behavior of the change point detection when applied online to a data stream was demonstrated to be robust enough to qualify for practical purposes. As a standard stationary risk measure, the standard deviation of portfolio returns was also used to compare diﬀerent portfolio optimization strategies on real

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

769

Total return evolution of portfolios 1.5 1.4

cluster−based FlexM Markowitz 1/n − Portfolio

Performance

1.3 1.2 1.1 1 0.9 0.8

2006

2007 Year

2008

2009

Figure 10. Value evolution of diﬀerent portfolio-optimization algorithms.

ﬁnancial data. It was demonstrated that application of the presented numerical scheme based on the identiﬁcation of market phases can decrease the risk of resulting portfolios in terms of this risk measure. Strictly speaking, the empirical standard deviation of portfolio returns is not an equitable risk measure, in that it assumes the underlying data to be stationary. If we assume the underlying market process to be nonstationary (as is done in the context of the presented numerical approach), the evaluation and comparison of risks should be done in a measure that reﬂects this. Numerical studies of robust portfolio optimization methods and empirical quantiﬁcation of risk by methods that do not rely on stationarity assumptions are a matter of future research. Having demonstrated the applicability of the presented computational scheme to a realistic ﬁnancial data example, a more extensive online data study is clearly needed for validation. Also, transaction costs should be included and diﬀerent risk measures should be taken into account. In contrast to climatological [14, 17, 15, 18] or biophysical applications [19], repeating phases seem to be rare in ﬁnancial time series, and additional time weighting (e.g., exponential decay) should be further investigated.

770

L. PUTZIG, D. BECHERER, AND I. HORENKO

Risk (intrinsic) evolution of portfolios

0

10

−2

Risk (intrinsic)

10

−4

10

−6

10

−8

10

Cluster−based FlexM Markowitz 1/n−Portfolio

−10

10

2006

2007 Year

2008

2009

Figure 11. Intrinsic risk (see (4.4)) evolution of diﬀerent portfolio-optimization algorithms.

Risk (standard deviation) evolution of portfolios

standard deviation of portfolio returns

0.025

0.02

cluster−based

0.015

FlexM Markowitz 1/n − Portfolio

0.01

0.005

0

2006

2007 Year

2008

2009

Figure 12. Risk (empirical standard deviation) evolution of diﬀerent portfolio-optimization algorithms.

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

771

Average deviation from target return rate 0.025 cluster−based FlexM Markowitz 1/n−Portfolio

Deviation

0.02

0.015

0.01

0.005

0

2006

2007 Year

2008

2009

Figure 13. Risk (deviation from target mean return) evolution of diﬀerent portfolio-optimization algorithms.

Table 1 Rate of return and diﬀerent risk measures. Algorithm Cluster-based FlexM Markowitz 1 -portfolio n

Rate of return 4.01% −0.55% 0.30% 5.74%

Averaged risk 1.4164 7.5113 1.5180 3.7022

intrinsic 10−3 10−3 10−3 10−3

Standard of returns 2.4613 7.8968 3.9324 6.1033

deviation 10−3 10−3 10−3 10−3

Deviation from target mean return 2.4619 10−3 7.8935 10−3 3.9325 10−3 6.1003 10−3

772

L. PUTZIG, D. BECHERER, AND I. HORENKO

Total return of cluster−based and FlexM portfolio 1.1

Realized total return

1.08 1.06 cluster−based 1.04

FlexM

1.02 1 0.98 0.96 0

0.5

1 1.5 2 2.5 3 Target Return Rate in % per year

3.5

Figure 14. Total return depending on the yearly target return rate C in (3.5).

4

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

−3

standard deviation of Portfolio returns

9

x 10

8

773

Risk (standard deviation) of different portfolios

cluster−based FlexM

7 6 5 4 3 2 1 0 0

0.5

1 1.5 2 2.5 3 Target Return Rate in % per year

3.5

4

Figure 15. Risk (empirical standard deviation) depending on the yearly target return rate C in (3.5).

774

L. PUTZIG, D. BECHERER, AND I. HORENKO

Value evolution of notional neutral portfolios 1.1 1.05 1

Total return

0.95 0.9 0.85 0.8 0.75

cluster−based FlexM

0.7

Markowitz 0.65

1/n−Portfolio 2006

2007 Year

2008

2009

Figure 16. Value evolution of diﬀerent notional neutral portfolio-optimization algorithms.

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

0

775

Risk (intrinsic) evolution of portfolios

10

−2

10

Risk (intrinsic)

−4

10

−6

10

−8

10

Cluster−based FlexM Markowitz 1/n−Portfolio

−10

10

−12

10

2006

2007 Year

2008

2009

Figure 17. Intrinsic risk (4.4) evolution of diﬀerent notional neutral portfolio-optimization algorithms.

776

L. PUTZIG, D. BECHERER, AND I. HORENKO

Risk (standard deviation) evolution of notional neutral portfolios 0.01

standard deviation of portfolio returns

0.009 0.008

cluster−based

0.007

FlexM Markowitz

0.006

1/n−Portfolio

0.005 0.004 0.003 0.002 0.001 0

2006

2007 Year

2008

2009

Figure 18. Risk (empirical standard deviation) evolution of diﬀerent notional neutral portfolio-optimization algorithms.

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

777

average deviation from target mean return of notional neutral portfolios 0.01

deviation from target mean returns

0.009 0.008

cluster−based

0.007

FlexM Markowitz

0.006

1/n−Portfolio

0.005 0.004 0.003 0.002 0.001 0

2006

2007 Year

2008

2009

Figure 19. Risk (deviation from target mean return) evolution of diﬀerent notional neutral portfoliooptimization algorithms.

778

L. PUTZIG, D. BECHERER, AND I. HORENKO REFERENCES

[1] A. Akansu and R. Haddad, Multiresolution Signal Decomposition: Transforms, Subbands, and Wavelets, Academic Press, San Diego, CA, 1992. [2] T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, J. Econometrics, 31 (1986), pp. 307–327. ¨ [3] U. C ¸ elikyurt and S. Ozekici, Multiperiod portfolio optimization models in stochastic markets using the mean-variance approach, European J. Oper. Res., 179 (2007), pp. 186–202. [4] V. DeMiguel and F. Nogales, Portfolio selection with robust estimation, Oper. Res., 57 (2009), pp. 560–577. [5] D. Duffie, Futures Markets, Prentice–Hall, Englewood Cliﬀs, NJ, 1989. [6] R. Engle, Autoregressive conditional heteroscedasticity with estimates of variance of United Kingdom inﬂation, Econometrica, 50 (1982), pp. 987–1008. [7] F. Fabozzi, P. Kolm, D. Pachamanova, and S. Focardi, Robust Portfolio Optimization and Management, John Wiley & Sons, New York, 2007. [8] G. Frahm and U. Jaekel, Tyler’s M-estimator, random matrix theory, and generalized elliptical distributions with applications to ﬁnance, Phys. A, submitted. [9] H. Fu, M. K. Ng, M. Nikolova, and J. L. Barlow, Eﬃcient minimization methods of mixed l2-l1 and l1-l1 norms for image restoration, SIAM J. Sci. Comput., 27 (2006), pp. 1881–1902. [10] L. Garlappi, R. Uppal, and T. Wang, Portfolio selection with parameter and model uncertainty: A multi-prior approach, Rev. Financ. Stud., 20 (2007), pp. 41–81. [11] D. Goldfarb and G. Iyengar, Robust portfolio selection problems, Math. Oper. Res., 28 (2003), pp. 1–38. [12] J. Hadamard, Sur les probl`emes aux d´eriv´ees partielles et leur signiﬁcation physique, Bull. Univ. Princeton 13, Princeton University, Princeton, NJ, 1902, pp. 49–52. [13] J. Hamilton, A new approach to the economic analysis of nonstationary time series and the business cycle, Econometrica, 57 (1989), pp. 357–384. [14] I. Horenko, On simultaneous data-based dimension reduction and hidden phase identiﬁcation, J. Atmospheric Sci., 65 (2008), pp. 1941–1954. [15] I. Horenko, On robust estimation of low-frequency variability trends in discrete Markovian sequences of atmospheric circulation patterns, J. Atmospheric Sci., 66 (2009), pp. 2059–2072. [16] I. Horenko, Finite element approach to clustering of multidimensional time series, SIAM J. Sci. Comput., 32 (2010), pp. 62–83. [17] I. Horenko, On clustering of non-stationary meteorological time series, Dyn. Atmos. Oceans, 49 (2010), pp. 164–187. ¨ tte, Automated generation of reduced stochastic [18] I. Horenko, R. Klein, S. Dolaptchiev, and Ch. Schu weather models I: Simultaneous dimension and model reduction for time series analysis, Multiscale Model. Simul., 6 (2008), pp. 1125–1145. ¨ tte, Likelihood-based estimation of multidimensional Langevin models and [19] I. Horenko and Ch. Schu its application to biomolecular dynamics, Multiscale Model. Simul., 7 (2008), pp. 731–773. [20] I. Jolliffe, Principal Component Analysis, Springer, New York, 2002. [21] R. Kalman, A new approach to linear ﬁltering and prediction problems, Trans. ASME J. Basic Eng., 82 (1960), pp. 35–45. [22] B. Kedem and K. Fokianos, Regression Models for Time Series Analysis, Wiley Ser. Probab. Stat., John Wiley & Sons, New York, 2002. [23] H. Krolzig, Predicting Markov-switching vector autoregressive processes, J. Forecasting, to appear. [24] O. Ledoit, P. Santa-Clara, and M. Wolf, Flexible multivariate GARCH modeling with an application to international stock markets, Rev. Econ. Statist., 3 (2003), pp. 735–747. [25] O. Ledoit and M. Wolf, Improved estimation of the covariance matrix of stock returns with an application to portfolio selection, J. Empirical Finance, 10 (2003), pp. 603–621. [26] O. Ledoit and M. Wolf, Honey, I shrunk the sample covariance matrix, J. Portfolio Management, 30 (2004), pp. 110–119. [27] H. Markowitz, Portfolio selection, J. Finance, 7 (1952), pp. 77–91. [28] R. Michaud, The Markowitz optimization enigma: Is optimized optimal?, Financial Analysts Journal, 45 (1989), pp. 31–42.

FUTURES PORTFOLIO AND MARKET PHASE DETECTION

779

[29] T. Rolski, H. Schmidli, V. Schmidt, and J. Teugels, Stochastic Processes for Insurance and Finance, Wiley Ser. Probab. Stat., John Wiley & Sons, Chichester, 1999. ¨ fer and K. Strimmer, A shrinkage approach to large-scale covariance matrix estimation and [30] J. Scha implications for functional genomics, Stat. Appl. Genet. Mol. Biol., 4 (2005), article 32. [31] W. Sharpe, Capital asset prices: A theory of market equilibrium under conditions of risk, J. Finance, 9 (1964), pp. 425–442. [32] A. Tikhonov, On the stability of inverse problems, Dokl. Akad. Nauk SSSR, 39 (1943), pp. 195–198. [33] R. S. Tsay, Analysis of Financial Time Series, John Wiley & Sons, New York, 2005. [34] D. E. Tyler, A distribution-free M-estimator of multivariate scatter, Ann. Statist., 15 (1987), pp. 223– 251. [35] R. Welsch and X. Zhou, Application of robust statistics to asset allocation models, REVSTAT, 5 (2007), pp. 97–114.

Recommend Documents

OPTIMAL CONSUMPTION AND PORTFOLIO ... - Semantic Scholar