The Matrix-Restricted Total Least Squares Problem Amir Beck∗ November 12, 2006
Abstract We present and study the matrix-restricted total least squares (MRTLS) devised to solve linear systems of the form Ax ≈ b where A and b are both subjected to noise and A has errors of the form DEC. D and C are known matrices and E is unknown. We show that MRTLS problem amounts to solving a problem of minimizing a sum of fractional quadratic terms and a quadratic function and compare it to the related restricted TLS problem of Van Huffel and Zha [10]. Finally, we present an algorithm for solving the MRTLS which is based on a reduction to a single-variable minimization problem. This reduction is shown to have the ability of eliminating local optima points.
1
Introduction
Consider an uncertain linear system: e = b + w, (A + E)x
(1.1)
e ∈ Rm×n , w ∈ Rm are unknown. In the case in which A ∈ Rm×n , b ∈ Rm are known and E e and w are not assumed to have any underlying structure, the total least squares when E e and a perturbation (TLS) approach to this problem [6, 7, 9] is to seek a perturbation matrix E e 2 + kwk2 subject to the consistency equation (A + E)x e = b + w. vector w that minimize kEk It is well known [6, 9] that the TLS problem can be reduced to the following problem in the x variables: kAx − bk2 , (1.2) minn x∈R kxk2 + 1 and that its solution can be expressed via the singular value decomposition of the augmented matrix (A, b). e possesses some linear structure, i.e., L(E) ˜ = 0 for some When the perturbation matrix E linear operator L, the corresponding problem can be expressed as ˜ 2 + kwk2 : (A + E)x e = b + w, L(E) e = 0}, min {kEk
E,w,x
Department of Industrial Engineering and Management, Technion—Israel Institute of Technology, Haifa 32000, Israel. E-mail:
[email protected]. ∗
1
which is known as the structured total least squares (STLS) problem. In contrast to TLS problems for which efficient solution procedures are known, STLS problems often give rise to hard nonconvex problems and current algorithms for solving these type of problems are not guaranteed to find a global solution but rather a local solution or even just a stationary point [18, 15, 1, 14, 12, 11]. e has a block There are several exceptions for this state of affairs. If A (and therefore also E) circulant structure, then the corresponding STLS problem can be solved by decomposing the problem into several smaller TLS problems using the discrete Fourier transform [2]. Another tractable STLS problem arises when some of the columns of A are error free while the other are subjected to noise. This problem is called the generalized TLS (GTLS) problem or mixed LS-TLS problem and its solution can be obtained by computing a QR factorization of A and then solving a TLS problem of reduced dimension [8]. A more general problem is e w) = D1 EC1 , the restricted TLS problem introduced in [10]. Here it is assumed that (E, where D1 ∈ Rm×p and C1 ∈ Rl×(n+1) are known matrices and E ∈ Rp×l is unknown. As was mentioned in [10], by choosing the matrices D1 and C1 appropriately, the restricted TLS problem can handle any weighted least squares (LS), generalized LS, total LS, and generalized TLS problems. The restricted TLS problem can be solved using the restricted singular value decomposition [19]. In this paper we introduce and analyze the matrix-restricted TLS (MRTLS) problem in e has the form E ˜ = DEC, where D ∈ Rm×p , E ∈ which the unknown matrix perturbation E Rp×l , C ∈ Rl×n . The STLS problem for this structure can therefore be expressed as (MRTLS): min {kEk2 + kwk2 : (A + DEC)x = b + w} E,w,x
(1.3)
This problem is of course related to the restricted TLS problem, however, while it is e w) has assumed in the restricted TLS problem that the augmented perturbation matrix (E, e is of this the ”DEC” structure, in the MRTLS problem only the perturbation matrix E structure. This allows us to model different situations than those handled by the restricted TLS. For example, the choice µ ¶ Im1 D= , C = In , (1.4) 0(m−m1 )×m1 corresponds to the situation in which the first m1 rows of A are contaminated by noise, while the remainder m − m1 rows are not subjected to noise; all the components of the right-hand side vector b are assumed to be noisy. In other words, A and b can be decomposed: µ ¶ µ ¶ b1 A1 , (1.5) ,b = A= b2 A2 with A1 ∈ Rm1 ×n , A2 ∈ R(m−m1 )×n , b1 ∈ Rm1 and b2 ∈ Rm−m1 , so that the linear problem suitable for the choice (1.4) is: A1 x ≈ b1 , A2 x ≈ b2 , where A1 , b1 and b2 are subjected to noise and A2 is error free. This is evidently a mixture of LS and TLS problems. It is different from the ”mixed LS-TLS” problem introduced in 2
[8] in which part of the columns of A are subjected to noise. This model will be called the horizontal mixed LS-TLS problem. We also note that for D = Im , C = In the MRTLS problem reduces to the standard TLS problem and that by choosing D = Im , C = (0n1 , In−n1 ), we recover the generalized TLS problem in which the first n1 columns of A are error free while the rest n − n1 are noisy. Another scenario in which the MRTLS structure is suitable is when the components of the e are correlated and there exist nonsingular (square) matrices D and perturbation matrix E e −1 are uncorrelated with equal variance. C for which the components of D−1 EC The paper is organized as follows. In Section 2 we derive a simplified form of the MRTLS problem (1.3). We show that the MRTLS problem amounts solving a problem of minimizing a sum of fractional quadratic terms and a single quadratic term. We then compare the derived MRTLS problem to the restricted TLS problem of [10], which is a simpler problem consisting of minimizing only a single fractional expression. Sufficient conditions for the existence of the MRTLS solution are derived in Section 3 in the case when DDT is a projection; this case includes the horizontal LS-TLS model. Finally, in Section 4.1 we describe an algorithm for solving the MRTLS problem. The procedure is based on a reduction of the multivariate problem into a one-dimensional optimization problem. We prove and illustrate that the process of passing to the one-dimensional problem can eliminate local optima points but can not add any new ones. For the convenience of the reader, a detailed MATLAB implementation of our code is given. Notation. For simplicity, instead of inf/sup we use min/max; however this does not mean that we assume that the optimum is attained and/or finite. Vectors are denoted by boldface lowercase letters, e.g., y, and matrices are denoted by boldface uppercase letters e.g., A. For any symmetric matrix A and positive definite matrix B, we denote the corresponding generalized minimum eigenvalue by λmin (A, B); the generalized minimum eigenvalue has several equivalent formulations: λmin (A, B) = max{λ : A − λB º 0} = min x6=0
xT Ax = λmin (B−1/2 AB−1/2 ), xT Bx
where we use the notation A º 0 (A ≻ 0) for a positive semidefinite (positive definite) matrix A. We follow the MATLAB convention and use ”;” for adjoining scalars, vectors or matrices in a column. The identity matrix of size m × m is denoted by Im and 0k×l stands for the zero matrix of size k × l. For a linear space S, we denote by PS the orthogonal projection onto the space S.
2
The MRTLS problem
In order to analyze and solve the MRTLS problem, we find in Section 2.1 a simplified form of the MRTLS problem which is expressed only by the x variables. We then compare in Section 2.2 the structure of the derived form of the MRTLS problem to the one of the restricted TLS problem.
3
2.1
A (more) Explicit Form of the MRTLS Problem
The next lemma simplifies the problem by eliminating the E and w variables. This is done by minimizing first with respect to the variables E and w. Lemma 2.1 Let A ∈ Rm×n , b ∈ Rm , D ∈ Rm×p and C ∈ Rl×n . Then (E, w, x) ∈ Rp×l × Rm × Rn is an optimal solution of (1.3) if and only if x is an optimal solution of © ª minn (Ax − b)T ((Im + (xT CT Cx)DDT )−1 (Ax − b) . (2.1) x∈R
Proof: Note that for a fixed x, problem (1.3) is a linearly constrained convex problem and therefore the KKT conditions in this case are necessary and sufficient [4, Proposition 3.4.1], and we conclude that (E, w) is an optimal solution of (1.3) if and only if there exists λ ∈ Rm such that E + DT λxT CT = 0,
(2.2)
w − λ = 0,
(2.3)
(A + DEC)x = b + λ.
(2.4)
Substituting (2.2) and (2.3) into (2.4) we obtain: Ax − b = (Im + (xT CT Cx)DDT )λ, which yields λ = (Im + (xT CT Cx)DDT )−1 (Ax − b).
(2.5)
Therefore, kEk2 + kwk2
(2.3)
kEk2 + kλk2
(2.2)
=
(xT CT Cx)λT DDT λ + λT λ
=
λT (Im + (xT CT Cx)DDT )λ
=
(Ax − b)T ((Im + (xT CT Cx)DDT )−1 (Ax − b),
=
so that problem (1.3) reduces to (2.1). 2 In order to analyze our main problem (2.1), we will occasionally use an even more explicit expression for its objective function. To do so, consider the spectral decomposition of DDT : DDT = UT ΛU,
(2.6)
where U ∈ Rm×m is an orthogonal matrix, Λ = diag(λ1 , λ2 , ..., λk , 0, . . . , 0), λi > 0 and k = rank(DDT ). Such a decomposition exists by the symmetry of DDT . Using the decomposition (2.6), problem (2.1) becomes n o T T T −1 e ˜ ˜ e minn (Ax − b) ((Im + (x C Cx)Λ) (Ax − b) , x∈R
˜ = UT b. The latter problem can also be written as follows: e = UT A and b where A © ª minn (Fx − g)T (Im + (xT CT Cx)Λ1 )−1 (Fx − g) + kPx − qk2 . x∈R
4
(2.7)
Here F ∈ Rk×n , P ∈ R(m−k)×n , g ∈ Rk , q ∈ Rm−k and Λ1 ∈ Rk×k are defined by the following identities: µ ¶ µ ¶ µ ¶ F g Λ 0 1 k×(m−k) ˜= e = A , b , Λ= . (2.8) P q 0(m−k)×k 0(m−k)×(m−k)
Problem (2.7) can also be written as a problem of minimizing a sum of fractional quadratic terms and a single quadratic term: ( k ) X (f T x − gi )2 i min + kPx − qk2 , (2.9) T CT Cx x∈Rn 1 + λ x i i=1
where f1T , . . . , fkT are the rows of F and g1 , . . . , gk are the components of g.
Remark 2.1 A problem of a related structure to (2.9) was addressed in [3] where the Tikhonov regularization of the TLS (TRTLS) was considered. The TRTLS consists of minimizing a fractional quadratic function and a pure quadratic term: ½ ¾ kAx − bk2 2 + kLxk . (TRTLS): min (2.10) kxk2 + 1 While (2.9) is a reminiscent of the TRTLS problem, there are several major differences. First, problem (2.9) consists of minimizing a sum of fractional terms rather than a single fractional expression. Second, the denominators in (2.9) are not strictly convex functions since C is not necessarily of full row rank. Finally, the quadratic term in (2.9) is not homogenous as the one in the TRTLS problem. All these differences complicate the problem considerably.
2.2
Connection to the Restricted TLS
As was mentioned in the introduction, in the restricted TLS problem [10], both the matrix and right-hand side vector have the ”restricted” structure. Namely, the corresponding optimization problem is: ½ µ ¶ ¾ x 2 b min kEk : (A + D1 EC1 ) =0 , (2.11) E,x −1
b denotes where D1 ∈ Rm×p and C1 ∈ Rl×(n+1) (and therefore E is a p × l matrix). Here A the augmented matrix (A, b). For the sake of simplicity we assume that D1 has full row rank. This assumption can be removed in the price of making a more subtle analysis. We also assume that b ∈ / R(A); otherwise any solution to problem (2.11) will be of the form (0, x0 ) where x0 is an arbitrary solution to the system Ax = b. Similarly to Lemma 2.1, we now derive a simpler optimization problem than (2.11) by eliminating the E variables. Lemma 2.2 Let A ∈ Rm×n , b ∈ Rm , D1 ∈ Rm×p and C1 ∈ Rl×(n+1) . Assume that D1 has full row rank. Then (E, x) ∈ Rp×l × Rn is an optimal solution of (2.11) if and only if x is an optimal solution of ) ( b T (D1 DT )−1 A˜ bx ˜T A x 1 ˜ = (x; −1) , :x (2.12) min x∈Rn ˜ ˜ T CT1 C1 x x 5
b = (A, b). where A
˜ = (x; −1), problem (2.11) becomes: Proof: By denoting x n o b + D1 EC1 )˜ x=0 . min kEk2 : (A E,x
(2.13)
We now proceed as in the proof of Lemma 2.1. By fixing x, the problem becomes a linearly constrained problem in the variables E and thus E is an optimal solution of (2.13) if and only if there exists λ ∈ Rm such that E + DT1 λ˜ xT CT1 = 0, b x + D1 EC1 x ˜ = 0. A˜
Substituting the first equation into the second equation we obtain: T T b x − D1 DT λ˜ ˜ = 0. A˜ 1 x C1 C1 x
b x 6= 0 (by the assumption that b ∈ ˜ 6= 0 and, by ˜ T CT1 C1 x Since A˜ / R(A)), we conclude that x using the assumption that D1 has full row rank, we obtain λ=
1
˜ ˜ T CT1 C1 x x
b x, (D1 DT1 )−1 A˜
so that the optimal value of problem (2.13) is equal to ˜) = kEk2 = (λT D1 DT1 λ)(˜ xT CT1 C1 x
b T (D1 DT )−1 A˜ bx ˜T A x 1 , T ˜ ˜ T C1 C1 x x
and the result follows. 2 Note that problem (2.12) is much simpler than the MRTLS problem (2.9). Indeed, the restricted TLS problem (2.12) has a form similar to the one of the TLS problem (1.2), i.e, it consists of minimizing a fractional quadratic function. Therefore, it not surprising that the solution of this problem – similarly to the solution of the TLS problem – can be computed by using some kind of a generalization of the SVD [10]. In contrast, the MRTLS problem amounts to minimizing a sum of fractional quadratic functions and a quadratic term (2.9), which is a much more complicated structure. It seems impossible to solve this problem by simple means such as SVD-type methods.
3
Existence of the MRTLS Solution when DDT is a Projection
Problem (2.7) is bounded below by zero so that it must have an infimum. However, the infimum is not necessarily attained. An example with one fractional term can be found in Section 3 of [3]. In this section we derive a sufficient condition under which the minimum is guaranteed to exist. We will restrict the discussion to the case when DDT is a projection, 6
i.e., a matrix whose eigenvalues are zero or one [20, Theorem 4.1]. In this case Λ1 = I and problem 2.7 reduces to ½ ¾ kFx − gk2 2 min + kPx − qk . (3.1) x∈Rn 1 + xT CT Cx
We note that in the horizontal mixed LS-TLS model, DDT is a projection by the definition of D (1.4). Therefore, the results of this section can be applied to the horizontal mixed LS-TLS problem which takes the form: kA x − b k2 1 1 2 (3.2) min + kA2 x − b2 k , {z } | x∈Rn kxk2 + 1 | {z } LS term TLS term
where A1 , A2 , b1 and b2 are defined by (1.5).
Theorem 3.1 Consider problem (3.1) with F ∈ Rk×n , P ∈ R(m−k)×n , g ∈ Rk and q ∈ Rm−k . Assume that Null(C) ∩ Null(P) = {0}. Let N be a matrix whose columns form an orthonormal basis for the null space of P and let x0 be any solution to the system Px = PIm(P) (q). Suppose that the following condition is satisfied: λmin (M1 , M2 ) < λmin (NT FT FN, NT CT CN),
(3.3)
where M1 =
µ
¶ ¶ µ T T NT FT FN NT F(Fx0 − g) N C CN NT CT Cx0 . , M2 = xT0 CT CN 1 + xT0 CT Cx0 (Fx0 − g)T FT N kFx0 − gk2
Then the minimum of (3.1) is attained. Remark 3.1 Before proceeding with the proof of the theorem, it is important to establish the positive definiteness of the matrices NT CT CN and M2 since otherwise the generalized minimum eigenvalues in (3.3) would not be well defined. The matrix NT CT CN is positive definite by the assumption that Null(C) ∩ Null(P) = {0}. Moreover, for every w ∈ Rr , t ∈ R such that (w; t) 6= 0r+1 (here r is the dimension of the null space of P), we have
T
T
(w; t) M2 (w; t)
µ
NT CT CN NT CT Cx0 = (w , t) xT0 CT CN 1 + xT0 CT Cx0 T
¶µ ¶ w t
= wT NT CT CNw + 2wT NT CT Cx0 t + xT0 CT Cx0 t2 + t2 = kCNw + tx0 k2 + t2 .
(3.4)
If t 6= 0 then (3.4) is positive. On the other hand, if t = 0 then w is nonzero and the expression kCNw + tx0 k2 + t2 = wT NT CT CNw is positive by the positive definiteness property of NT CT CN. We thus conclude that M2 is positive definite.
7
Proof of Theorem 3.1: Consider the decomposition q = PIm(P) (q) + PNull(PT ) (q). Then by the orthogonality of PIm(P) (q) and PNull(PT ) (q) we have kPx − qk2 = kPx − PIm(P) (q)k2 + kPNull(PT ) (q)k2 . Let x0 ∈ Rn be, as defined in the premise of the theorem, an arbitrary solution of the consistent system Px = PIm(P) (q). By making the change of variables y = x − x0 and omitting the constant term kPNull(PT ) (q)k2 , problem (3.1) transforms to ½ ¾ kFy + Fx0 − gk2 2 min f (y) ≡ + kPyk . (3.5) y∈Rn 1 + (y + x0 )T CT C(y + x0 ) The problem is bounded below by zero and thus has an infimum. In order to show the attainment of the infimum (i.e., existence of a minimum), let us assume in contradiction that the minimum is not attained. In that case there must exists a sequence {yn } such o n that yn ∗ ∗ kyn k → ∞ and f (yn ) → f where f is the infimum of problem (3.5). The sequence kyn k n o y is comprised of unit-norm vectors and therefore has a subsequence kynnk k that converges k to a unit-norm vector d ∈ Rn . Since {f (ynk )} is a convergent sequence we have f (ynk ) → 0. kynk k2
(3.6)
On the other hand, f (y) = f1 (y) + f2 (y) where kFy + Fx0 − gk2 , f1 (y) = 1 + (y + x0 )T CT C(y + x0 ) f2 (y) = kPyk2 . Combining (3.6) with the nonnegativity of the functions f1 , f2 we conclude that f1 (ynk ) → 0, kynk k2 f2 (ynk ) → 0. kynk k2 y
(3.7) (3.8)
The second limit (3.8) together with kynnk k → d implies that kdk2 = 1 and kPdk2 = 0 so k that d ∈ Null(P); a direct consequence of the latter inclusion is that d = Nv where N is defined in the premise of the theorem and v ∈ Rr is nonzero (r being the dimension of the
8
null space of P). Therefore, f∗ = ≥
lim f (ynk )
k→∞
lim f1 (ynk )
k→∞
kFynk + Fx0 − gk2 k→∞ 1 + (ynk + x0 )T CT C(ynk + x0 ) °2 ° ° Fynk 0 −g ° ° kyn k + Fx kynk k ° k = lim ´ ´ ³ ³ k→∞ ynk +x0 T 1 T C ynk +x0 + C 2 kyn k kyn k kyn k =
lim
k
k
k
dT FT Fd vT NT FT FNv = = dT CT Cd vT NT CT CNv T T ≥ λmin (N F FN, NT CT CN).
Now, combining the derived inequality f ∗ ≥ λmin (NT FT FN, NT CT CN) with condition (3.3), we obtain f ∗ > λmin (M1 , M2 ). (3.9) On the other hand, f ∗ = minn f (y) y∈R
≤ minn {f (y) : y ∈ Null(P)} = minr {f (Nv)} v∈R y∈R ½ ¾ kFNv + Fx0 − gk2 = minr v∈R 1 + (Nv + x0 )T CT C(Nv + x0 ) ¾ ½ kFNv + tFx0 − tgk2 : t 6= 0 = min v∈Rr ,t∈R t2 + (Nv + tx0 )T CT C(Nv + tx0 ) We now claim that the value of problem (3.10) is equal to the value of ½ ¾ kFNv + tFx0 − tgk2 min : (v; t) 6= 0m−k+1 , v∈Rr ,t∈R t2 + (Nv + tx0 )T CT C(Nv + tx0 )
(3.10)
(3.11)
which, by the definition of generalized minimum eigenvalues, is equal to λmin (M1 , M2 ). To show the equality between the values of (3.10) and (3.11), suppose on the contrary that the value of the minimization problem (3.11) is strictly less than the value of (3.10). Then in that case λmin (M1 , M2 ), which is the optimal value of problem (3.11), is equal to ¾ ½ kFNvk2 T T T T : v 6= 0r , λmin (N F FN, N C CN) = minr v∈R (Nv)T CT C(Nv) which is a contradiction to (3.3). Therefore, we have shown that f ∗ ≤ val(3.11) = λmin (M1 , M2 ). However, this contradicts inequality (3.9), and we thus conclude that the solution of (3.1) is attained.2 9
Remark 3.2 Weak inequality is always satisfied in (3.3): the matrix in the right-hand side of (3.3) is a principal submatrix of the one in the left-hand side. Hence, by the interlacing theorem of eigenvalues [20, Theorem 7.8], weak inequality holds. Remark 3.3 For the unstructured TLS problem (D = Im , C = In ), condition (3.3) reduces b < σmin (A), where A b to the well known attainability condition for the TLS problem: σmin (A) is the augmented matrix (A, b).
4
Solving the MRTLS Problem 1-D Solvers
In this section we present an algorithm for solving the MRTLS problem. The algorithm is based on converting the problem into a single-variable minimization problem and then invoking a 1-D solver. We show that the multivariate problem has at least as many local optima points as the one-dimensional problem and demonstrate that local optima points tend to vanish in the passage to the one-dimensional problem. The section ends with the presentation of a detailed MATLAB code for solving the MRTLS problem. We assume in this section that the matrix A is of full column rank.
4.1
Reduction to a Single-Variable Optimization Problem
In order to solve problem (2.1), we use a similar methodology to the one used in [3] and convert the problem into a problem of minimizing a single-variable function: min G(α), α≥0
(4.1)
where G(α) is defined as © ª G(α) = minn (Ax − b)T (Im + αDDT )−1 (Ax − b) : xT CT Cx = α . x∈R
(4.2)
Calculating function values of G requires solving a minimization problem with a quadratic objective function and a quadratic equality constraint. This problem is nonconvex due to the nonconvexity of its equality constraint but nonetheless can be solved efficiently; for details see Section 4.2. It can be shown that the function G is in fact continuous on [0, ∞]; the proof of this result, which relies on sensitivity analysis for optimization problems, is behind the scope of this paper and is thus omitted. The function G might have several local optima but in practice we observed – through numerous random problems – that it is almost always an unimodal function, namely a function with a single local optimum (which is also the global optimum). A theoretical justification for the latter empirical observation is given in Theorem 4.1 below that shows that each local optimum of the one-dimensional problem (4.1) corresponds to at least one local optimum of the multivariate problem (2.1). Interestingly, the reverse claim does not hold true in general. Therefore, local optima points of the multivariate problem (2.1) might vanish in the transition to the one-dimensional problem (4.1).
10
Theorem 4.1 Suppose that α0 is a local optimum of the single-variable problem (4.1) and let x be an optimal solution of (4.2) with α = α0 . Then x is an local optimum solution of the MRTLS problem (2.1). Proof: Since α0 is a local optimum solution of (4.1) it follows that there exists an interval I = (α0 − δ, α + δ) such that α0 ∈ I and G(α0 ) ≤ G(α) for every α ∈ I ∩ [0, ∞). Now, as stated in the premise of the theorem, let x0 be an optimal solution of problem (4.2) with α = α0 . We will show that x0 is of (2.1). Indeed, let x ∈ Rn satisfy n an local optimum solution o δ kx − x0 k ≤ ρ where ρ = min 1, 2λmax (CT C)(kx . Then by the mean value theorem, we 0 k+1) obtain that there exist λ ∈ [0, 1] such that xT CT Cx − xT0 CT Cx0 = 2(x0 + λ(x − x0 ))T (CT C)(x − x0 ), so that |xT CT Cx − xT0 CT Cx0 | ≤ 2λmax (CT C)kx0 + λ(x − x0 )k · kx − x0 k ≤ 2λmax (CT C)(kx0 k + kx − x0 k) · kx − x0 k ≤ 2λmax (CT C)(kx0 k + 1) · ρ ≤ δ. Therefore, since xT0 CT Cx0 = α0 , it follows that xT CT Cx ∈ I ∩ [0, ∞) and as a result G(xT CT Cx) ≥ G(α0 ). Finally, denoting the objective function in (2.1) by g(y) ≡ (Ay − b)T ((Im + (yT CT Cy)DDT )−1 (Ay − b), we obtain that g(x) ≥ G(xT CT Cx) ≥ G(α0 ) = G(xT0 CT Cx0 ) = g(x0 ) for every x satisfying kx − x0 k ≤ ρ, proving the local optimality of x0 . 2 The result of Theorem 4.1 implies that the transition of the multivariate problem into a one-dimensional problem can be viewed as process of eliminating local optima points. We illustrate this attractive property by an example. Example. We consider the horizontal mixed LS-TLS problem (3.2) with randomly chosen A1 , A2 ∈ R3×2 , b1 , b2 ∈ R3 . A mesh and contour plots of the objective function of (3.2) are plotted in Figure 1. The global optimum of this two-dimensional function is attained at v = (0.443, 1.173) (marked by a triangle) and it has an additional local optimum at w = (−0.214, −2.983) (marked by a square). However, the corresponding one-dimensional function G, plotted in Figure 2, has only one local optimum point 1 that is attained at kvk2 = 1.572. Note that the function G does not have a local solution at kwk2 = 8.943 so that the local optimum point w vanishes in the process of passing to the one-dimensional problem. 1
which is also the global optimum.
11
10 8 6 4 2 0 −2 −4 −6 −8 −10 −10
−8
−6
−4
−2
0
2
4
Figure 1: Mesh and contour plots of the function
4.2
6
8
10
kA1 x−b1 k2 kxk2 +1
+ kA2 x − b2 k2
Evaluating Function Values of G
Calculating function values of G amounts to solving a minimization problem of the form min {xT Bx − 2cT x + d : xT Gx = h},
x∈Rn
(4.3)
where B ∈ Rn×n is a symmetric positive definite matrix2 , G is a positive semidefinite matrix, c ∈ Rn , d and h is a positive number. This is a special case of the generalized trust region subproblem (GTRS) [16] which consists of minimizing a general quadratic function (possible indefinite) subject to a general quadratic constraint. Under some mild condition, it is known that this class of problems possesses necessary and sufficient optimality conditions and that – as a result – the problem can be efficiently solved [16]. In particular, by Theorem 3.2 of 2
B is positive definite (and not only positive semidefinite) since A is assumed to have full column rank.
12
40 35
G(α)
30 25 20 15 10 5 0 1.572 3 4
6
8
10 α
12
Figure 2: A plot of the function G(α) = min
n
14
16
kA1 x−b1 k2 α+1
18
20
+ kA2 x − b2 k2 : kxk2 = α
o
[16], x is an optimal solution of (4.3) if and only if there exists λ ∈ R such that3 (B − λG)x = c, xT Gx = h, B − λG º 0. We will make a standard assumption that in fact B − λG ≻ 0 at the optimal λ. It follows directly from the optimality conditions that the optimal solution of (4.3) is given by x = (B − λG)−1 c where λ is a solution to the scalar equation φ(λ) = h,
λ
0. Output: v- the optimal value of (4.3) (up to some tolerance ǫ2 ). Step 1: Calculate ρ - the generalized maximum eigenvalue of the matrix pair (G, B). Step 2: Set η0 = ρ1 − ǫ1 and k = 0. Step 3: Repeat the following steps until |φ(ηk ) − h| < ǫ2 Step 3.a Calculate a Cholesky factorization: B − λG = LT L. Step 3.b Set y = L−1 L−T c and calculate φ(ηk ) = yT Gy. Step 3.c Set z = L−T Gy and calculate φ′ (ηk ) = 2zT Gz. Step 3.d Set φ−1/2 (ηk ) − h−1/2 ηk+1 = ηk + 2 −3/2 . φ (ηk )φ′ (ηk ) Step 4. Set v = yT By − 2cT y + d. In our implementation the tolerance parameters ǫ1 and ǫ2 take the values ǫ1 = 10−5 , ǫ2 = 10−10 . The Newton steps in the above algorithm converge to the optimal λ in very few iterations (usually no more than 6).
4.3
MATLAB Implementation
We now present a MATLAB implementation of the function gfun that calculates function values of G. function val=gfun(alpha,A,b,DDT,CTC); % solves: min (Ax-b)^T*(I+alpha*DDT)^{-1}*(Ax-b) s.t. x^T*CTC*x=alpha % input: alpha ............. real number % A ............. m*n matrix assumes to have full column rank % b ............. m vector % DDT ............. an m*m matrix that stands for D*D^T % CTC ............. an n*n matrix that stands for C^T*C % output: val ............. the optimal value of the problem [m,n]=size(A); Dinv=inv(eye(m)+alpha*DDT); B=A’*Dinv*A; c=A’*Dinv*b; d=b’*Dinv*b; G=CTC; G=(G+G’)/2; B=(B+B’)/2; opts.tol=1e-6; opts.disp=0;
14
epsilon1=1e-5; epsilon2=1e-10; eta=1/eigs(G,B,1,’la’,opts)-epsilon1; y=zeros(length(B),1); while (abs(y’*G*y-alpha)>epsilon2) R=chol(B-eta*G); y=R\(R’\c); z=R’\(G*y); phi=y’*G*y; phid=2*z’*z; eta=eta+2*(phi-phi^(1.5)/sqrt(alpha))/phid; end val=y’*B*y-2*c’*y+d; Now, in order to solve the MRTLS problem we can use any one-dimensional solver. In the example of Section 4.1 we used the MATLAB function fminbnd that finds a local optimum of a one-dimensional function over an interval. This MATLAB function is based on golden section search and parabolic interpolation and is guaranteed to find the global optimum when the function is unimodal. The MATLAB command is: fminbnd(@(alpha)gfun(alpha,A,b,D*D’,C’*C),l,u,optimset(’Display’,’iter’)) where l and u are lower and upper bounds on the value of xT CT Cx at an optimal solution x.
References [1] T. J. Abatzoglou, J. M. Mendel, and G. A. Harada. The constrained total least squares technique and its applications to harmonic supperresolution. IEEE Trans. Signal Processing, 39(5):1070–1087, May 1991. [2] A. Beck and A. Ben-Tal. A global solution for the structured total least squares problem with block circulant matrices. SIAM J. Matrix Anal. Appl., 27(1):238–255 (electronic), 2005. [3] A. Beck Beck and A. Ben-Tal. On the solution of the Tikhonov regularization of the total least squares problem. SIAM J. Optim., 17(1):98–118 (electronic), 2006. [4] D. P. Bertsekas. Nonlinear Programming. Belmont MA: Athena Scientific, second edition, 1999. [5] C. Fortin and H. Wolkowicz. The trust region subproblem and semidefinite programming. Optim. Methods Softw., 19(1):41–67, 2004. 15
[6] G. H. Golub and C. F. Van Loan. An analysis of the total least-squares problem. SIAM J. Numer. Anal., 17(6):883–893, Dec. 1980. [7] G. H. Golub and C. F. Van Loan. Matrix Computations. Baltimore MD: Johns Hopkins Univ. Press, third edition, 1996. [8] S. Van Huffel and J. Vandewalle. Analysis and properties of the generalized total least squares problem AX ≈ B when some or all columns in A are subject to error. SIAM J. Matrix Anal. Appl., 10(3):294–315, 1989. [9] S. Van Huffel and J. Vandewalle. The Total Least-Squares Problem: Computational Aspects and Analysis, volume 9 of Frontier in Applied Mathematics. Philadelphia, PA: SIAM, 1991. [10] S. Van Huffel and H. Y. Zha. The restricted total least squares problem: formulation, algorithm, and properties. SIAM J. Matrix Anal. Appl., 12(2):292–309, 1991. [11] P. Lemmerling and S. Van Huffel. Analysis of the structured total least squares problem for Hankel/Toeplitz matrices. Numerical Algorithms, 27:89–114, 2001. [12] N. Mastronardi, P. Lemmerling, and S. Van Huffel. Fast structured total least squares algorithm for solving the basic deconvolution problem. SIAM J. Matrix Anal. Appl., 22(2):533–553, 2000. [13] A. Melman. A unifying convergence analysis of second-order methods for secular equations. Mathematics of Computation, 66(217):333–344, 1997. [14] B. De Moor. Structured total least squares and L2 approximation problems. Linear Alg. Appl., 188-189:163–207, 1993. [15] B. De Moor. Total least squares for affinely structured matrices and the noisy realization problem. IEEE Trans. Signal Processing, 42(11):3104–3113, Nov. 1994. [16] J. J. Mor´e. Generalizations of the trust region subproblem. Optim. Methods Software, 2:189–209, 1993. [17] J. J. Mor´e and D. C. Sorensen. Computing a trust region step. SIAM J. Sci. Statist. Comput., 4(3):553–572, 1983. [18] J. B. Rosen, H. Park, and J. Glick. Total least norm formulation and solution for structured problems. Siam J. Matrix Anal. Appl., 17(1):110–126, 1996. [19] H. Y. Zha. The restricted singular value decomposition of matrix triplets. SIAM J. Matrix Anal. Appl., 12(1):172–194, 1991. [20] F. Zhang. Matrix theory. Universitext. Springer-Verlag, New York, 1999. Basic results and techniques.
16