LIDS REPORT 2822 November 2009
1
Approximate Solution of Large-Scale Linear Inverse Problems with Monte Carlo Simulation β Nick Polydorides, Mengdi Wang, Dimitri P. Bertsekasβ November 19, 2009
Abstract We consider the approximate solution of linear ill-posed inverse problems of high dimension with a simulation-based algorithm that approximates the solution within a low-dimensional subspace. The algorithm uses Tikhonov regularization, regression, and low-dimensional linear algebra calculations and storage. For sampling eο¬ciency, we use variance reduction/importance sampling schemes, specially tailored to the structure of inverse problems. We demonstrate the implementation of our algorithm in a series of practical large-scale examples arising from Fredholm integral equations of the ο¬rst kind.
1
Introduction
We consider linear inverse problems of the form π΄π₯ = π,
(1)
where π΄ is an π Γ π real matrix, π is a vector in βπ and π₯ is the unknown vector in βπ . Such problems typically arise from discretized Fredholm integral equations of the ο¬rst kind, such as those encountered in image processing, geophysical prospecting, and astronomy [Gro07], [BB98]. The β Research supported by the LANL Information Science and Technology Institute, by the Cyprus Program of MIT Energy Initiative, and by NSF Grant ECCS-0801549. Thanks are due to Janey Yu for helpful discussions. β Laboratory for Information and Decision Systems, M.I.T, Cambridge, Mass. 02139.
system (1) may be either underdetermined or overdetermined. We consider a least-squares formulation min β₯π΄π₯ β πβ₯2π ,
(2)
π₯ββπ
where π is a known probability distribution vector with positive components. When π or π is very large, the exact optimal solution π₯β of problem (2) becomes computationally formidable. In this paper, we propose to approximate π₯β within a low-dimensional subspace π = {Ξ¦π β£ π β βπ }, where Ξ¦ is an πΓπ matrix whose columns represent basis functions of π. Our methodology involves subspace approximation, Monte-Carlo simulation, regression, and most signiο¬cantly, only low-dimensional vector operations (of order π , the number of basis functions). We thus focus on the following approximation to problem (2): min β₯π΄Ξ¦π β πβ₯2π .
πββπ
The optimal solution is where
πβ = πΊβ1 π,
πΊ = Ξ¦β² π΄β² ππ΄Ξ¦,
(3)
π = Ξ¦β² π΄β² ππ,
(4)
π is the π Γ π diagonal matrix with components of π along its diagonal (we assume throughout this paper that πΊ is invertible). Since the direct calculation of πΊ and π may be prohibitively expensive, we propose to estimate their values by simulation, as suggested in [BY09]. For example, we may generate a sequence {(π0 , π0 , πΒ―0 ), . . . , (ππ‘ , ππ‘ , πΒ―π‘ )} by sampling independently according to some distribution π from the set of index triples Λ and πΛ given (π, π, Β―π) β {1, . . . , π}3 . Then we may estimate πΊ and π with πΊ by Λ= πΊ
π‘ 1 β πππ πππ ππ πππ πΒ―π πππ πΒ―β²ππ π‘+1 πππ ππ πΒ―π π=0
π‘
πΛ =
1 β πππ πππ ππ πππ πππ , π‘+1 πππ ππ
(5)
π=0
where we denote by πππ the (π, π)th component of π΄, πππ the marginal probability of (π, π), πππ πΒ― the marginal probability of (π, π, Β―π), and πβ²π the πth row of Λ β1 πΛ. When π‘ β β, as shown Ξ¦. One natural approximation of πβ is πΛ = πΊ 2
Λ β πΊ and πΛ β π with probability 1, so πΊ Λ β1 πΛ β πβ with in [BY09], we have πΊ probability 1. There have been several proposals in the literature relating to the exact or approximate solution of large-scale inverse problems. One of the earliest attempts is by Lanczos [Lan58], which successively approximates the solution without explicit matrix inversion. Since then a number of iterative methods have been studied, such as the Landweber iteration, the conjugate gradient method, the LSQR algorithm (see the survey [HH93] for a comprehensive review of these methods). A projection-regularization approach, proposed by OβLeary and Simmons [OS81], approximates the solution within a subspace in which the projected coeο¬cient matrix is bidiagonalizable. A related approach proposed by Calveti and Zhang [DCZ99] suggests the use of Lanczos bidiagonalization with Gauss quadrature. Later, a trust-region formulation was proposed by Rojas and Sorensen [RS02], which poses the regularized problem as an inequality constrained least-squares problem. Our work diο¬ers from those mentioned above in that it involves both subspace approximation and simulation, and relies exclusively on low-dimensional vector operations. The origins of our approach can be traced to projected equation methods for approximate dynamic programming, which aim to solve forms of Bellmanβs equation of very large dimension by using simulation (see the books by Bertsekas and Tsitsiklis [BT96], Sutton and Barto [SB98], and Bertsekas [Ber07]). These methods were recently extended to apply to general square systems of linear equations and regression problems in the paper by Bertsekas and Yu [BY09], which was the starting point for the present paper. The companion paper [WPB09] emphasizes generic methodological aspects of regression and variance analysis for importance sampling schemes, and may serve as a theoretical basis for the present work, which emphasizes algorithmic approaches for the solution of practical inverse problems. The paper is organized as follows. In Section 2, we present general aspects of subspace approximation, regression, and our simulation framework. In Section 3, we discuss alternative methods for designing importance sampling distributions for simulation, in the context of our algorithmic methodology. In Section 4, we apply our methodology to a number of practical inverse problems of large dimension, and we present the computational results.
3
2 2.1
Approximation Methodology Based on Simulation and Regression Simulation Framework
We want to estimate the matrix πΊ and vector π of Eq. (4), which deο¬ne the optimal low-dimensional solution πβ [cf. Eq. (3)]. Equation (5) provides one such approach. In this section we will present a few alternative approaches. One possibility, proposed in [WPB09], is to estimate each component of { }π‘ πΊ and π using a separate sequence (ππ , ππ , πΒ―π ) π=0 . Then we may estimate the (β, π)th component of πΊ or the βth component of π with Λ βπ = πΊ
π‘ 1 β πππ πππ ππ πππ πΒ―π πππ β πβ²πΒ―π π , π‘+1 πππ ππ πΒ―π π=0
π‘
πΛβ =
1 β πππ πππ ππ πππ πππ β , (6) π‘+1 πππ ππ π=0
where we denote by ππβ the (π, β)th component of Ξ¦. This component-bycomponent approach requires (π 2 + 3π )/2 separate sample sequences (since πΊ is symmetric), which increases the time complexity of the computation. Nonetheless, as we will discuss later, this allows the customization of the sampling distribution to the particular component, according to principles of importance sampling, so fewer samples per component may be required for the same solution accuracy. Another possibility is to generate one sequence per column or row of πΊ and one sequence for π, which requires π + 1 separate sample sequences. More generally, we may partition the set of components of πΊ and π, and then generate one sample sequence per block. With a judicious partitioning strategy, the potential advantage of this strategy is twofold: ο¬rst, grouping together components that can be estimated using similar distributions so as to improve the eο¬ciency of the sampling process, and second, estimating βalmost independentβ components independently so as to reduce the bias induced by correlation among the components of the estimates. We now brieο¬y discuss alternative mechanisms to generate sample triples (π, π, Β―π). The simplest scheme is to sample ππ , ππ , and πΒ―π independently from one another, according to distributions π1 , π2 , and π3 , respectively. Then the marginal probabilities for pairs (ππ , ππ ) and triples (ππ , ππ , πΒ―π ) are πππ ππ = π1,ππ π2,ππ ,
πππ ππ πΒ―π = π1,ππ π2,ππ π3,πΒ―π .
{ } An alternative is to generate an independent sequence of indices π0 , π1 , . . . according to a distribution π, and then generate ππ and πΒ―π conditioned on 4
each ππ , according to transition probabilities πππ ππ and πΛππ πΒ―π . In this case, the marginal probabilities are πππ ππ = πππ πππ ππ ,
πππ ππ πΒ―π = πππ πππ ππ πΛππ πΒ―π .
A somewhat more { } complex scheme is to generate a sequence of state transitions π0 , π1 , . . . using an irreducible Markov chain with transition probability matrix π and initial distribution π0 . Sampling ππ and Β―ππ according to some transition probabilities πππ ππ and πΛππ πΒ―π yields marginal probabilities for pairs (ππ , ππ ) and triples (ππ , ππ , Β―ππ ): πππ ππ = (π0β² π π )ππ πππ ππ ,
πππ ππ Β―ππ = (π0β² π π )ππ πππ ππ πΛππ πΒ―π .
Here the choice of π should ensure that all row indices are sampled inο¬nitely Λ β πΊ and πΛ β π (and hence also πΊ Λ β1 πΛ β πβ ) as π‘ β β, with often, so that πΊ probability 1. In particular, if π is an irreducible Markov chain, we can use as π the distribution of long-term frequencies of state visits corresponding to π .
2.2
Regression Methods
Λ and πΛ, we may estimate πβ [cf. Eq. (3)] with πΊ Λ β1 πΛ, but this estimate Given πΊ may be highly susceptible to simulation noise, particularly if πΊ is nearly singular. As a more reliable alternative, we consider the estimation of πβ using a form of regression and the model Λ + π, πΛ = πΊπ where π is the vector representing simulation error, Λ + πΛ β π. π = (πΊ β πΊ)π The standard least squares/regression/Tikhonov regularization approach yields the estimate { } Λ β πΛ)β² Ξ£β1 (πΊπ Λ β πΛ) + (π β πΒ―)β² Ξβ1 (π β πΒ―) , πΛ = arg minπ (πΊπ πββ
Λ β1 πΛ or a singular valuedwhere πΒ― is an a priori estimate (for example πΒ― = πΊ β1 Λ πΛ), and Ξ£ and Ξ are some positive deο¬nite symmetric based estimate of πΊ matrices. Equivalently, Λ β² Ξ£β1 πΊ Λ + Ξβ1 )β1 (πΊ Λ β² Ξ£β1 πΛ + Ξβ1 πΒ―). πΛ = (πΊ 5
(7)
An eο¬ective choice is to use as Ξ£ an estimate of the covariance of the error Λ β . Such an estimate can be obtained from the simulation using a π = πΛ β πΊπ nominal guess πΛ of πβ , i.e., the matrix π‘ π‘ )( )β² 1 β 1 β( β² Λ π +(Λ Λ π +(Λ Ξ£(Λ π) = ππ ππ = (πΊπ βπΊ)Λ πβππ ) (πΊπ βπΊ)Λ πβππ ) , π‘+1 π‘+1 π=0 π=0 (8) where each ππ can be viewed as a sample of π, and πΊπ and ππ are correspondΛ and πΛ. We refer ing sample terms that are averaged to form the estimates πΊ to the companion paper [WPB09] for further discussion and a derivation of an associated conο¬dence interval for the estimate πΛ of Eq. (7). For the experimental results reported in this paper, we have used the preceding regression procedure with Ξ£(Λ π) as given above, and with a nominal guess πΛ based on πΒ―. Another possibility is to use a form of iterative regression, whereby we estimate πβ by repeatedly using Eq. (7) with intermediate correction of the matrix Ξ£ using Eq. (8), which is the iteration ( β² ) ( β² ) Λ Ξ£(ππ )β1 πΊ Λ + Ξβ1 β1 πΊ Λ Ξ£(ππ )β1 πΛ + Ξβ1 πΒ― . ππ+1 = πΊ (9)
This iteration has been shown to converge locally [WPB09] to a ο¬xed point of Eq. (9), provided that the covariance of π is suο¬ciently small. Under certain circumstances, we may have prior knowledge about the high-dimensional solution π₯β of problem (2), which may suggest a natural type of regression. For example, in some inverse problems arising in physical science, it is desired that the regularization term be proportional to (π₯ β π₯ Β―)β² πΏβ² πΏ(π₯ β π₯ Β―) for some π Γ π matrix πΏ and an arbitrary prior guess π₯ Β―. Thus, we may take Ξβ1 = π½Ξ¦β² πΏβ² πΏΞ¦, for some π½ > 0. If the matrix Ξ¦β² πΏβ² πΏΞ¦ is not available in analytical form, we may estimate Ξ¦β² πΏβ² πΏΞ¦ based on simulation and take Ξβ1 to be Ξβ1 =
π‘ π½ β πππ ππ πππ Β―ππ πππ πΒ―β²ππ , π‘+1 πππ ππ Β―ππ π=0
where we denote by πππ the (π, π)th component of πΏ.
2.3 2.3.1
Extensions Special Case 1: Underdetermined Problems
In dealing with severely underdetermined problems (see [KS04] for examples of inverse problems of this type), we can estimate the components of 6
the high-dimensional solution π₯β of problem (2) directly, without subspace approximation. Assuming that π is reasonably small, we may take Ξ¦ = πΌ and adapt the preceding methodology as follows. Let Ξ£β1 = πΌ, Ξβ1 = π½πΌ, and π = πΌ in Eq. (7) for some π½ > 0, and let π₯ Λ = Ξ¦Λ π = πΛ and π₯ Β― = Φ¯ π = πΒ―. Equation (7) can be rewritten as π₯ Λ=π₯ Β― + (π΄β² π΄ + π½πΌ)β1 π΄β² (π β π΄Β― π₯).
(10)
We now note that π΄β² (π΄π΄β² + π½πΌ) = π΄β² π΄π΄β² + π½π΄β² = (π΄β² π΄ + π½πΌ)π΄β² , and that both matrices (π΄π΄β² + π½πΌ) and (π΄β² π΄ + π½πΌ) are positive deο¬nite and thus invertible. Hence we have (π΄β² π΄ + π½πΌ)β1 π΄β² = π΄β² (π΄π΄β² + π½πΌ)β1 . Thus Eq. (10) is equivalent with π₯ Λ=π₯ Β― + π΄β² (πΉ + π½πΌ)β1 π,
(11)
where we deο¬ne the π Γ π matrix πΉ and the π-dimensional vector π by πΉ = π΄π΄β² ,
π = π β π΄Β― π₯.
In analogy with the estimation of πΊ and π by using Eq. (5), we may generate one sample sequence {(π0 , π0 ), . . . , (ππ‘ , ππ‘ )} according to a distribution π, and estimate πΉ and π with πΉΛ =
π‘
1 β 1 ππ πβ² , π‘+1 πππ ππ π ππ
πΛ = π β
π=0
π‘
1 βπ₯ Β―ππ ππ , π‘+1 πππ π π=0
where we denote by πππ the ππ th column of π΄. Alternatively, in analogy with Eq. (6), we may use one sample sequence per component of πΉ and π, and estimate πΉβπ and πβ respectively with πΉΛβπ =
π‘
1 β πβππ ππππ , π‘+1 πππ ππ
πΛβ = π β
π=0
π‘
Β―ππ 1 β πβππ π₯ . π‘+1 πππ π=0
We now obtain the approximate solution π₯ Λ whose πth entry is computed as Λ π₯ Λπ = π₯ Β―π + πβ²π (πΉΛ + π½πΌ)β1 π. In this way, we are able to estimate components of π₯β directly, using only low-dimensional vector operations. 7
2.3.2
Special Case 2: Equality Constrained Problems
As a variation of problem (2), consider the following equality constrained least-squares problem min β₯π΄π₯ β πβ₯2π
π₯ββπ
(12)
s.t. πΏπ₯ = 0,
where πΏ is an π Γ π matrix. Following a similar approximation approach, we restrict problem (12) within the subspace π. Now the constraint πΏπ₯ = 0 becomes πΏΞ¦π = 0 or equivalently πβ² Ξ¦β² πΏβ² πΏΞ¦π = 0, which is also equivalent with Ξ¦β² πΏβ² πΏΞ¦π = 0. Thus, we may write the approximate problem as min β₯π΄Ξ¦π β πβ₯2π ,
πββπ
(13)
s.t. Ξ¦β² πΏβ² πΏΞ¦π = 0.
We assume that there exists at least one feasible solution for this problem. Introducing a Lagrange multiplier vector π β βπ and using standard duality arguments, we obtain the following necessary and suο¬cient condition for (πβ , πβ ) to be an optimal solution-Lagrange multiplier pair for problem (13): ( β) π π» = π, (14) πβ where we deο¬ne the 2π Γ 2π matrix π» and 2π -vector π as ( β² β² ) ( β² β² ) Ξ¦ π΄ ππ΄Ξ¦ Ξ¦β² πΏβ² πΏΞ¦β² Ξ¦ π΄ ππ π»= , π= . β² β² Ξ¦ πΏ πΏΞ¦ 0 0 We may now apply our simulation-based approximation approach of the preceding section to the system (14) (which is always low-dimensional, even if πΏ has a large row dimension). In particular, similar to Eq. (5), we may generate a sample sequence {(π0 , π0 , Β―π0 ), . . . , (ππ‘ , ππ‘ , Β―ππ‘ )} Λ and πΛ given by according to distribution π, and estimate π» and π with π» ) ( π‘ β πππ πππ ππ πππ πΒ―π πππ πΒ―β²π πππ ππ πππ πΒ―π πππ πΒ―β²π 1 1 π π Λ = , π» 0 πππ ππ πππ πΒ―π πππ πΒ―β²π π‘+1 πππ ππ πΒ―π π π=0
and πΛ =
π‘
1 β πππ πππ ππ πππ π‘+1 πππ π=0
8
(
πππ 0
) .
Alternatively, we may generate one sample sequence per component of π» and π , and estimate the components with formulas that are similar to Eq. (6). 2.3.3
Special Case 3: Inequality Constrained Problems
Another variation of problem (2) is the inequality constrained least-squares problem min β₯π΄π₯ β πβ₯2π
π₯ββπ
s.t. πΏπ₯ β€ π,
(15)
where πΏ is an π Γ π matrix and the row dimension π is assumed to be small. We consider a restriction of this problem within the subspace π, given by min β₯π΄Ξ¦π β πβ₯2π
πββπ
s.t. πΏΞ¦π β€ π, or equivalently min πβ² πΊπ β 2πβ² π,
πββπ
s.t. π π β€ π, where πΊ and π are deο¬ned in Eq. (4), and π = πΏΞ¦. We may now apply the simulation approach of Section 2.1. For example, we may generate one Λ and πΛ using Eq. (5), single sample sequence, then estimate πΊ and π with πΊ Λ and estimate π with π given by Λ = π
π‘
1 β 1 ππ πβ² , π‘+1 πππ π ππ π=0
where we denote by ππ the πth column of πΏ. Alternatively, we may generate one sample sequence per component of π , and estimate πβπ with Λ βπ = π
π‘
1 β 1 πβπ ππ π . π‘+1 πππ π π π=0
The resulting approximate problem takes the form of Λ β 2Λ min πβ² πΊπ πβ² π,
πββπ
Λ π β€ π, s.t. π 9
which is low-dimensional in both the cost function and the inequality constraints. Now we can apply standard quadratic programming techniques to solve this problem. Note that it is essential to assume that πΏ has a small Λ has low dimension. row dimension, so that π
3 3.1
Variance Reduction by Importance Sampling Variance Analysis
The central idea of our simulation method is to evaluate πΊ and π of Eq. (4) with a weighted average of samples generated by some probabilistic mechanism [cf. Eq. (5) and Eq. (6)]. A critical issue is the reduction of the variances Λ β πΊ and πΛ β π. To this end, we consider the use of of the simulation errors πΊ importance sampling, which aims at variance reduction in estimating large sums and integrals by choosing an βappropriateβ probability distribution for generating samples. Let Ξ© be the sample space, π : Ξ© 7β βπ be a function, and {π0 , . . . , ππ‘ } be samples generated from Ξ© according to some process with invariant disβ tribution π. We may estimate the large sum π§ = πβΞ© ππ with π‘
π§Λ =
1 β ππ π , π‘+1 πππ π=0
and we would like to choose π so that π§Λ has a small variance. If π = 1 and π is a nonnegative function,1 this variance is ( π‘ ) β (ππ /π§)2 π§2 π var {Λ π§} = β1 , π‘+1 πππ π=0
and is minimized when the sampling distribution is π β = π/π§. Calculating π β is impossible because it requires knowledge of π§, but by designing the distribution π to be close to π/π§, we can reduce the variance of π§Λ (see the companion paper [WPB09] for further analysis). In what follows in this section, we discuss a few schemes for designing importance sampling distributions, which are tailored to the data of the problem. 1 The nonnegativity of π may be assumed without essential loss of generality. If π takes negative values, we may decompose π as
π = π+ β πβ, + so that both π β are positive functions, and then estimate separately π§1 = β π and β and π§2 = πβΞ© ππ .
10
β πβΞ©
ππ
3.2 3.2.1
Designing Importance Sampling Distributions An Importance Sampling Scheme for Estimating πΊβπ
We focus on estimating the component πΊβπ by generating a sequence of index triples and using Eq. (6). In this case the sample space Ξ© and the function π are Ξ© = {1, . . . , π}3 , π(π, π, Β―π) = ππ πππ ππΒ―π ππβ πΒ―ππ . We want to design the sampling distribution π so that it is close to π β and belongs to some family of relatively simple distribution functions. We have used a scheme that generates the indices π, π, and Β―π sequentially. The optimal distribution satisο¬es β πππ Β― ππ β₯πΒ― π β₯1 ) π β (ππβ β₯ππ β₯1 ) (πΒ―
ππ πππ ππΒ―π , β₯ππ β₯1 β₯πΒ―π β₯1
where we denote by ππ the β πth column of π΄, and denote by β₯ππ β₯1 the πΏ1 norm of ππ (i.e., β₯ππ β₯1 = ππ=1 β£πππ β£). We approximate π β by approximating the functions ππ πππ ππΒ―π ππβ β₯ππ β₯1 , πΒ―ππ β₯πΒ―π β₯1 , β₯ππ β₯1 β₯πΒ―π β₯1 with distributions
ππ ,
π(π β£ π, Β―π),
πΒ―π ,
β is approximated with π Β― respectively, so πππ Β― ππ Β― π = ππ πΒ― π π(π β£ π, π). π Let us denote by π the approximation operator that maps a set of trial values of π to an approximation of the entire function π. For instance, we can take π to be a piecewise constant approximation: for any π¦ β βπ and πΌ = {π1 , . . . , ππΎ } β {1, . . . , π}, πΎ ( ) β π [π¦π ]πβπΌ = π¦ππ 1 ([(ππβ1 + ππ )/2, (ππ + ππ+1 )/2]) , π=1
where 1 denotes the function that is equal to 1 within the corresponding interval, and 0 otherwise, and we deο¬ne π0 = βπ1 and ππΎ+1 = 2π β ππΎ . For another example, we may take π to be the piecewise linear approximation, β ( ( ) πΎβ1 ) π [π¦π ]πβπΌ = L (ππ , π¦ππ ), (ππ+1 , π¦ππ+1 ) 1 ([ππ , ππ+1 ]) , π=1
where we denote by L the linear function that takes value π¦ππ at ππ and value π¦ππ+1 at ππ+1 , and assume without loss of generality that π1 = 0 and 11
An Importance Sampling Scheme for Estimating πΊβπ : 1. Select a small set [πππ ]π,πβπΌ of components of π΄, and a corresponding small set of rows [πβ²π ]πβπΌ of Ξ¦. 2. Generate the sample triple (ππ , ππ , Β―ππ ) by ([ β ] ) (a) sampling ππ according to πππ β Tππ ππβ πβπΌ πππ πβπΌ , ([ ] ) β (b) sampling Β―ππ according to πΒ―ππ β TΒ―ππ πππ πβπΌ πππ πβπΌ , (c) sampling ππ conditioned on ππ and Β―ππ according to ([ ] ) π(ππ β£ ππ , Β―ππ ) β Tππ ππ ππππ ππΒ―ππ πβπΌ where we denote by ππ (β
) the πth component of π (β
).
ππΎ = π. The resulting importance sampling algorithm is summarized in what follows. Figure 1 illustrates the last step of this importance sampling scheme, where π΄ is an 1000Γ1000 matrix, π = πΌ and π is taken to be the operator of piecewise constant/linear approximation. We start with a low-dimensional representation of π΄, namely [πππ ]π,πβπΌ , which can be implemented using a uniformly spaced discretization as illustrated in Fig. 1(a). The resulting distributions π(ππ β£ ππ , Β―ππ ) are plotted in Fig. 1(b)-(c), and compared with the exact optimal conditional distribution. 3.2.2
Variations of the Importance Sampling Scheme
The importance sampling scheme given in the preceding section is only one possibility for generating samples to estimate πΊβπ . An alternative is to replace the distributions in steps 2(a) and 2(b) with πππ β πππ β ,
πΒ―ππ β πππ π ,
or with approximations of the above functions. This simpliο¬ed version is easier to implement, and may reduce the computational complexity greatly if Ξ¦ is known to have a simple analytical form. We may also change the order of generating ππ , ππ , and Β―ππ . For instance, we can generate ππ ο¬rst, and then ππ and Β―ππ conditioned on ππ , according to 12
β3
β3
x 10
1
3.5
β3
x 10
12
3.5
x 10
ΞΎ β (ik | jk , Β―jk ) 3
ΞΎ β (ik |, jk Β―jk ) 3
Λ k | jk , Β―jk ) ΞΎ(i
Λ k | jk , Β―jk ) ΞΎ(i
10
2.5
2.5
8
500
2
1.5
1.5
6
4
2
1000
2
1
500
1
1
0.5
0.5
0
1000
0
200
400
600
800
1000
0
0
200
400
i
(π)
(π) 3.5
3.5
800
1000
x 10
ΞΎ β (ik |, jk Β―jk )
ΞΎ β (ik | jk , Β―jk ) 3
1000
β3
x 10
12
800
(π)
β3
β3
x 10
1
600 i
3
Λ k | jk , Β―jk ) ΞΎ(i
Λ k | jk , Β―jk ) ΞΎ(i
10
2.5
2.5
2
2
1.5
1.5
8
500
6
4
2
1000
1
500
1000
1
1
0.5
0.5
0
0
200
400
600
800
i
(π)
(π)
1000
0
0
200
400
600 i
(π )
Figure 1: Illustration of step 2(c) of the proposed importance sampling scheme. In (a), the color ο¬eld represents the 1000 Γ 1000 matrix π΄; the two vertical lines represent the columns πππ and πΒ―ππ ; and the grid represents [πππ ]π,πβπΌ , which is an 8Γ8 discretization of π΄. In (b)/(c) the conditional distribution π(ππ β£ ππ , Β―ππ ) (obtained by piecewise constant/linear approximation using [πππ ]π,πβπΌ ) is plotted against the optimal distribution π β (ππ β£ ππ , Β―ππ ). In (d)-(f) the same process is repeated with a ο¬ner 20 Γ 20 discretization of π΄. the distributions πππ β β₯πππ β₯1 ,
π(ππ β£ ππ ) β πππ β πππ ππ ,
π(Β―ππ β£ ππ ) β πππ π πππ ππ .
If π΄ and Ξ¦ have complicated forms, we may ο¬rst replace them with coarse approximations, and then introduce a step of function approximation when computing the distributions. When π΄ has relatively sparse rows, by sampling the row index ο¬rst, we may greatly improve the eο¬ciency of sampling. The most straightforward scheme is to approximate the three-dimensional function π = ππ πππ ππΒ―π ππβ πΒ―ππ directly: ο¬rst take trial samples from the sample space Ξ© = {1, . . . , π}3 and approximate π by ο¬tting some function (e.g., a 13
piecewise constant/linear function) based on the trial samples. More specifically, we may take πΌ β {1, . . . , π} and obtain [π(π, π, Β―π)]π,π,Β―πβπΌ . Then we can compute the approximate function by ( ) πΛ = π [π(π, π, Β―π)]π,π,Β―πβπΌ , where we maintain the notation π for the operator of function approximation, and ο¬nally normalize πΛ to obtain a distribution function. However, this scheme may be computationally expensive, because it involves selecting trial samples from a three-dimensional space and then sampling according to a three-dimensional distribution. A critical choice in all the schemes mentioned above is the function approximation operator π , and a good choice may depend on the characteristics of the problem at hand, i.e., π΄ and Ξ¦. Also, as an alternative to the piecewise constant/linear approximation used in Figure 1, we may consider an approximation approach based on a least-squares ο¬t from some family of parameterized functions. In particular, we may approximate the function π : Ξ© 7β β by introducing a parametric family, which we denote by { } π = ππ π β βπ , where ππ : Ξ© 7β β is a function parameterized by π; or we may consider the family of a ο¬nite sum of parameterized functions, in which case {π } β πππ ππ β βπ , π = 1, . . . , π , π= π=1
where π is a positive integer. Given the trial samples and corresponding function values {π(π, π, Β―π)}π,π,Β―πβπΌ , we can approximate π with πΛ by minimizing the total squared error corresponding to the trial samples, i.e., ( ) β π [π(π, π, Β―π)]π,π,Β―πβπΌ = argminπΛβπ β₯π(π, π, Β―π) β πΛ(π, π, Β―π)β₯2 . π,π,Β― πβπΌ
This method may be preferable in circumstances where π is known to have some special structure, in which case we may choose π accordingly and improve the approximation accuracy. We have focused so far on estimating the components πΊβπ . It is straightforward to extend the preceding methodology to estimating components of π, and also components of πΉ and π for underdetermined problems, cf. the special cases of Section 2.3. 14
4
Inverse Problem Applications
In this section we apply the proposed algorithmic methodology on a number of practical inverse problems, involving both underdetermined systems (see Sections 4.1 and 4.2) and square systems (see Sections 4.3-4.6). These problems take the form of Fredholm integral equations of the ο¬rst kind, and are discretized into linear systems π΄π₯ = π, where π΄ is π Γ π or π Γ π, and π, the dimension of the solution space, is taken to be π = 109 . The matrix π΄ is typically ill-conditioned and dense. The components of π΄ and π are accessible, and can be computed analytically. We aim for the solution π₯β of the discretized system π΄π₯ = π. For square systems, we consider its approximate solution within a subspace spanned by π = 50 or π = 100 multi-resolution basis functions, which are piecewise constant functions with disjoint local support [KS04]. For underdetermined systems, we use the approach introduced in Section 2.3.1 and estimate speciο¬c components of π₯β directly. Note that the computational complexity is completely determined by π (or π for underdetermined systems). Our experiments are run on a dual processor personal computer with 4GB RAM Λ and πΛ (or πΉΛ and πΛ for underdetermined running Matlab. The estimates πΊ systems) are obtained component-by-component based on separate sample sequences using Eq. (6). Each sequence is generated by using the importance sampling scheme given in Section 3.2, where we discretize the π-vectors involved (i.e., ππ and ππ ) into vectors of dimension 100, and use piecewise linear approximation to compute the sampling distributions. We have estimated each component of πΊ and π (or πΉ and π) with 104 samples and each sample takes 50ππ on average. The computational results are presented by comparing the high-dimensional approximate solution π₯ Λ = Ξ¦Λ π with the exact solution π₯β , and Ξ π₯β , the projection of π₯β on the subspace π. The performances of our importance sampling schemes are assessed with the total sample covariances of estimated Λ and πΛ (or πΉΛ and πΛ for underdetermined systems). components of πΊ
4.1
The inverse contamination release history problem
This is an underdetermined problem, whereby we seek to recover the release history of an underground contamination source based on measurements of plume concentration. Let π’(π€, π ) be the contaminant concentration at time π and distance π€ away from the source, and let π₯(π ) be the source release at time π . The transport of contaminant in the ground is governed by the
15
advection-diο¬usion model [WU96] β2π’ βπ’ βπ’ =π· 2 βπ , π€ β₯ 0, π β [0, ππ ], βπ βπ€ βπ€ subject to Cauchy initial and boundary conditions π’(0, π ) = π₯(π ),
π’(π€, 0) = 0,
lim π’(π€, π ) = 0,
π€ββ
where π· and π are coeο¬cients for the diο¬usion and velocity respectively. At a time π β₯ ππ the plume concentration is distributed as β« π π’(π€, π ) = dπ A(π€, π β π ) π₯(π ), 0
where A is the transport kernel
)2 } { ( π€ β π (π β π ) A(π€, π β π ) = β exp β . 4π·(π β π ) 4ππ·(π β π )3 π€
In our experiment we take π· = 0.8, π = 1, π = 300, and ππ = 250, and we assume the unknown release history to be π₯(π ) =
{ (π β π )2 } π π
π exp β , 2 2π π π=1
5 β
where π
= {0.5, 0.4, 0.3, 0.5, 0.5}, π = {60, 75, 150, 190, 225}, π = {35, 12, 10, 7, 3}, and we discretize it into a vector of length 109 , which is used as the vector π₯β . Then we compute π borehole concentration measurements at locations {π€π }π π=1 , as a discretization of π’(π€, π ) and form the vector π. In accordance with Section 2.3.1, we formulate the problem into Eq. (11) and estimate πΉ and π using simulation. Then we compute 1000 entries of Λ the regularization matrix Ξβ1 = 10β11 πΌ and π₯ Λ using the estimates πΉΛ and π, the initial guess πΒ― = 0. In Fig. 2, we compare the resulted entries π₯ Λπ against those of the exact solution π₯β . To analyze the eο¬ect of importance sampling, we evaluate the simulation Λ In error in terms of the total sample variances for components of πΉΛ and π. Fig. 3 we compare the reduction of simulation error for alternative importance sampling schemes and alternative ways of function approximation. It can be seen that the proposed importance sampling scheme substantially reduces the simulation error and improves the simulation eο¬ciency. Similar results have been observed in the subsequent problems. 16
1.2
1.2 xβ x Λ
1 0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
β0.2
β0.2
0
2
4
6
8
index
xβ x Λ
1
10
0
2
4
6
8
index
8
x 10
10 8
x 10
Figure 2: The simulation-based approximate solution π₯ Λ for the contamination release history reconstruction problem, compared with the exact solution π₯β , with π = 50 (left) and π = 100 (right).
0
0
10
10
β2
β2
10
10
β4
β4
10
10
β6
β6
10
10
β8
β8
10
10
β10
β10
10
10
β12
β14
10
β12
No approximation Piecewise constant approximation Piecewise linear approximation
10
0
1000
2000 3000 number of samples: t
No approximation Piecewise constant approximation Piecewise linear approximation
10
β14
4000
10
5000
0
200
400 600 number of trial points: q
800
1000
Figure 3: The reduction of simulation error for alternative importance sampling schemes. The simulation error is measured in terms of the sum of Λ The solid lines represent sample covariances for components of πΉΛ and π. the case where no approximation is implemented and a uniform sampling distribution is used; the dotted lines represent the cases where importance sampling is used, with distributions obtained by piecewise constant/linear approximations. The left-side ο¬gure illustrates the reduction of simulation error as the number of samples π‘ varies, while the number of trial points (i.e., the cardinality of πΌ, which is introduced for the purpose of function approximation; see Section 3.2.1) is ο¬xed at π = 500; the right-side ο¬gure plots the results when π varies, with the number of samples ο¬xed at π‘ = 1000.
17
1.8
1.8
1.6
1.6
1.4
1.4
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
xβ
0.2 0
x Λ 0
2
4
6
8
index
0.4
xβ
0.2
x Λ
0
10
0
2
4
6 index
8
x 10
8
10 8
x 10
Figure 4: The simulation-based approximate solution π₯ Λ = Ξ¦Λ π for the gravitational prospecting problem, compared with the exact solution π₯β , with π = 50 (left) and π = 100 (right).
4.2
Gravitational prospecting
This is an inverse problem encountered in searching for oil and natural gas resources. We want to estimate the earth density distribution based on measurements of gravitational force at some distance away from the surface. Here we consider a simpliο¬ed version of this problem as posed in [Gro07], where the spatial variation of the density is conο¬ned within the interior of a ring-shaped domain, and the measurements π are take on a circular trajectory positioned at the same plane but outside the ring. When the unknown density function π₯ and the data are deο¬ned on concentric trajectories, we express the problem in polar coordinates as β« 2π π(π) = dπA(π, π)π₯(π), 0 β€ π β€ 2π. 0
where A(π, π) =
2 β cos(π β π) . (5 β 4 cos(π β π))3/2
In the experiment, we take the unknown density function to be π₯(π) = β£ sin πβ£ + β£ sin 2πβ£,
0 β€ π β€ 2π,
so the measurement function π can be computed accordingly. We discretize the problem into a system of π = 50 and π = 100 equations, corresponding to π measurements, with π = 109 unknowns. For regularization we use Ξβ1 = 10β13 πΌ and πΒ― = 0. The approximate solution π₯ Λ is illustrated in Fig. 4, compared with π₯β the exact solution. 18
4.3
The second derivative problem
This problem refers to diο¬erentiating noisy signals that are usually obtained from experimental measurements. This problem has been extensively studied and the solution has been shown to exhibit instability with increasing level of noise [Cul71]. We denote by π the noisy function to be diο¬erentiated and denote by π₯ its second derivative. It is given by β« 1 π(π€) = dπ A(π€, π )π₯(π ) 0 β€ π, π€ β€ 1, 0
where A(π€, π ) is the Greenβs function of the second derivative operator { π€(π β 1) π€ < π, A(π€, π ) = π (π€ β 1) π€ β₯ π. In our experiment, we have used π₯(π ) = cos(2ππ ) β sin(6ππ ). Following the approach of [Han94] we discretize the integral using the Galerkin method, and obtain a system of π linear equations with π unknowns where π = 109 . We consider the approximate solution of the system using the preceding methodology, with the initial guess πΒ― = 0 and the regularization matrix Ξ = 10β5 πΏβ²3 πΏ3 , where πΏ3 is the (π β 3) Γ π third-order diο¬erence operator. The obtained approximate solution Ξ¦Λ π is presented in Fig. 5, and is compared with the exact solution π₯β and the projected solution Ξ π₯β .
4.4
The Fox & Goodwin problem
This problem, introduced by Fox and Goodwin in [FG53], considers the solution of the integral equation β« 1 β π(π€) = dπ π€2 + π 2 π₯(π ), 0 β€ π€ β€ 1. 0
As shown in [FG53], this is a severely ill-posed problem and the condition number of its discretized integral operator increases exponentially with π. In our experiment, we assume the unknown solution to be π₯(π ) = π,
0 β€ π β€ 1, 19
2.5
2.5 xβ
2 1.5 1
1
0.5
0.5
0
0
β0.5
β0.5
β1
β1
β1.5
β1.5 2
Ξ xβ
1.5
x Λ = Ξ¦Λ r
0
xβ
2
Ξ xβ
4
6 index
8
β2
10 8
x Λ = Ξ¦Λ r
0
2
4
x 10
6 index
8
10 8
x 10
Figure 5: The simulation-based approximate solution π₯ Λ = Ξ¦Λ π for the second β derivative problem, compared with the exact solution π₯ and the projected solution Ξ π₯β . The subspace π has dimension π = 50 for the left-hand plot and dimension π = 100 for the right-hand plot. compute π accordingly, and discretize the system into a square linear system of dimension π = 109 . We consider its approximate solution in the subspace spanned by π = 50 or π = 100 multi-resolution basis functions, and introduce the regularization matrix Ξβ1 = 10β3 πΌ and the initial guess πΒ― = 0. The obtained approximate solution Ξ¦Λ π is presented in Fig. 6, plotted against the exact solution π₯β and the projected solution Ξ π₯β .
4.5
The inverse heat conduction problem
This problem seeks to reconstruct the time proο¬le of a heat source by monitoring the temperature at a ο¬xed distance away [Car82]. The onedimensional heat transfer in a homogeneous quarter plane medium, with known heat conductivity πΌ, is expressed by the elliptic partial diο¬erential (heat) equation β2π’ βπ’ = πΌ 2, π€ β₯ 0, π β₯ 0, βπ βπ€ π’(π€, 0) = 0, π’(0, π ) = π₯(π ), where π’(π€, π ) denotes the temperature at location π€ and time π . Let π be the temperature at a ο¬xed location π€ Β― away from the source, and it satisο¬es β« π π(π ) = dπA(π, π )π₯(π), 0
20
xβ 1
xβ 1
Ξ xβ x Λ = Ξ¦Λ r
0.8 0.6
0.6
0.4
0.4
0.2
0.2
0
0
2
Ξ xβ x Λ = Ξ¦Λ r
0.8
4
6 index
8
0
10
0
2
4
6 index
8
x 10
8
10 8
x 10
Figure 6: The simulation-based approximate solution π₯ Λ = Ξ¦Λ π for the FoxGoodwin problem, compared with the exact solution π₯β and the projected solution Ξ π₯β . The subspace π has dimension π = 50 for the left-hand plot and dimension π = 100 for the right-hand plot. where A is a lower-triangular kernel given by { { (π€/πΌ) 2} Β― Β― β π€/πΌ exp β 4(πβπ ) , 0 β€ π < π β€ π, 4π(πβπ )3 A(π, π ) = 0, 0 β€ π β€ π β€ π. In the experiment we take π = 1 and take the unknown target temperature function to be π₯(π ) =
{ (π β π )2 } π π
π exp β , 2 2π π π=1
3 β
0 β€ π β€ 1,
with π
= {4, 3, 6} Γ 10β4 , π = {0.3, 0.6, 0.8} and π = {0.1, 0.1, 0.05}, so π can be obtained accordingly. We discretize the integral equation into a linear square system of dimension π = 109 and consider its approximate solution within the subspace spanned by π = 50 or π = 100 multi-resolution basis functions. Also we assume an initial guess πΒ― = 0 and the regularization matrix Ξβ1 = π½πΏβ²1 πΏ1 , where πΏ1 is the (π β 1) Γ π discrete ο¬rst-order diο¬erence operator and π½ = 10β5 . The computational results are illustrated in Fig. 7.
4.6
A problem in optical imaging
Consider light passing through a thin slit, where the intensity of the diο¬racted light is a function of the outgoing angle and can be measured by some instrument. We wish to reconstruct the light intensity at the incoming side of the slit based on these measurements. Let π₯ be the incoming light intensity 21
β4
8
β4
x 10
8
x 10
xβ 6
xβ 6
Ξ xβ
Ξ xβ
x Λ = Ξ¦Λ r
x Λ = Ξ¦Λ r
4
4
2
2
0
0
β2
0
2
4
6 index
8
β2
10
0
2
4
6
8
index
8
x 10
10 8
x 10
Figure 7: The simulation-based approximate solution π₯ Λ = Ξ¦Λ π for the inverse heat conduction problem, compared with the exact solution π₯β and the projected solution Ξ π₯β . The subspace π has dimension π = 50 for the left-hand plot and dimension π = 100 for the right-hand plot. as a function of the incoming angle, and let π be the outgoing light intensity as a function of the outgoing angle, so that β« π(π) = where
π/2
βπ/2
dπA(π, π) π₯(π),
π, π β [βπ/2, π/2],
( )2 ( sin(π(sin π + sin π)) )2 A(π, π) = cos π + cos π π(sin π + sin π)
(we refer to [Jr.72] for further explanation of the physical aspects of this application). We discretize this integral equation into a square system of dimension π = 109 , and consider its approximation within the subspace spanned by π = 50 and π = 100 multi-resolution functions. The regularization matrix is taken to be Ξβ1 = π½πΏβ²3 πΏ3 and πΒ― = 0, where πΏ3 is the third-order diο¬erence operator and π½ = 10β5 . The corresponding computational results are plotted in Fig. 8.
5
Conclusions
In this paper, we have considered the approximate solution of linear inverse problems within a low-dimensional subspace spanned by an arbitrary given set of basis functions. We have proposed a simulation-based regularized regression approach, which can also be applied to large-scale problems with equality or inequality constraints. The algorithm uses importance sampling
22
1.3
1.3 xβ
1.2 1.1
Ξ xβ 1.1
x Λ = Ξ¦Λ r
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0
2
4
6 index
8
xβ
1.2
Ξ xβ
0.5
10
x Λ = Ξ¦Λ r
0
2
4
6 index
8
x 10
8
10 8
x 10
Figure 8: The simulation-based approximate solution π₯ Λ = Ξ¦Λ π for the optics problem, compared with the exact solution π₯β and the projected solution Ξ π₯β . The subspace π has dimension π = 50 for the left-hand plot and dimension π = 100 for the right-hand plot. and low-dimensional computations, and relies on designing sampling distributions involving the model matrices and basis functions spanning the subspace. We have elaborated on a few approaches for designing near-optimal distributions, which exploit the continuity of the underlying models. The performance of our method has been numerically evaluated on a number of classical problems. The computation experiments demonstrate an adequate reduction in simulation noise after a relatively small number of samples and an attendant improvement in quality of the resulted approximate solution. A central characteristic of our methodology is the use of low-dimensional calculations in solving high-dimensional problems. Two important approximation issues arise within this context: ο¬rst the solution of the problem should admit a reasonably accurate representation in terms of a relatively small number of basis functions, and second, the problem should possess a reasonably continuous/smooth structure so that eο¬ective importance sampling distributions can be designed with relatively small eο¬ort. In our computational experiments, simple piecewise polynomial approximations have proved adequate, but other more eο¬cient alternatives may be possible. We ο¬nally note that the use of regularized regression based on a sample covariance obtained as a byproduct of the simulation was another critical element for the success of our methodology with nearly singular problems.
References [BB98]
M. Bertero and P. Boccacci. Introduction to Inverse Problems in Imaging. Institute of Physics, Bristol, 1998. 23
[Ber07]
D.P. Bertsekas. Solution of large systems of equations using approximate dynamic programming methods. Lab. for Information and Decision Systems Report 2754, MIT, 2007.
[BT96]
D. P. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientiο¬c, Nashua, USA, 1996.
[BY09]
D.P. Bertsekas and H. Yu. Projected equation methods for approximate solution of large linear systems. Journal of Computational and Applied Mathematics, 227:27β50, 2009.
[Car82]
A. Carasso. Determining surface temperatures from interior observations. SIAM Journal of Applied Mathematics, 42(3):558β574, 1982.
[Cul71]
J. Cullum. Numerical diο¬erentiation and regularization. SIAM Journal of Numerical Analysis, 8(2):254β265, 1971.
[DCZ99] L. Reichel D. Calvetti and Q. Zhang. Applied and Computational Control, Signals and Systems, volume 1, chapter Iterative solution methods for large linear discrete ill-posed problems, pages 313β 367. Birkhauser, 1999. [FG53]
L. Fox and E. T. Goodwin. The numerical solution of non-singular linear integral equations. Phil. Trans. R. Soc. Lond. A, 245:501β 534, 1953.
[Gro07]
C.W. Groetsch. Integral equations of the ο¬rst kind, inverse problems and regularization: a crash course. Journal of Physics: Conference Series: Inverse Problems in Applied Sciences, towards breakthrough, 73:1β32, 2007.
[Han94]
P.C. Hansen. Regularization tools: A matlab package for analysis and solution of discrete ill-posed problems. Numerical Algorithms, 6:1β35, 1994.
[HH93]
M. Hanke and P.C. Hansen. Regularization methods for large scale problems. Surveys on Mathematics for Industry, 3:253β315, 1993.
[Jr.72]
C. B. Shaw Jr. Imrovement of the resolution of an instrument by numerical solution of an integral equation. Journal of Mathematical Analysis and Applications, 37:83β112, 1972. 24
[KS04]
J. Kaipio and E. Somersalo. Statistical and computational inverse problems. Springer, New York, 2004.
[Lan58]
C. Lanczos. Iterative solution of large-scale linear systems. J. Soc. Indust. and Appl. Math., 6(91):91β109, 1958.
[OS81]
D. P. OβLeary and J. A. Simmons. A bidiagonalization- regularization procedure for large scale discretizations of ill-posed problems. SIAM J. on Scientiο¬c and Statistical Computing, 2(4):474β489, 1981.
[RS02]
M. Rojas and D. C. Sorensen. A trust region approach to the regularization of large-scale discrete forms of ill-posed problems. SIAM J. on Scientiο¬c Computing, 23(6):1843β1861, 2002.
[SB98]
R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, 1998.
[WPB09] M. Wang, N. Polydorides, and D. P. Bertsekas. Approximate simulation-based solution of large-scale least squares problems. LIDS Report, MIT, 2009. [WU96]
A.D. Woodbury and T.J. Ulrych. Minimum relative entropy inversion: Theory and application to recovering the release history of a groundwater contaminant. Water Resoures Research, 9(32):2671β 2681, 1996.
25