a nonlinear primal-dual method for total variation ... - Semantic Scholar

Report 2 Downloads 154 Views
A NONLINEAR PRIMAL-DUAL METHOD FOR TOTAL VARIATION-BASED IMAGE RESTORATION TONY F. CHANy , GENE H. GOLUBz AND PEP MULETx

Abstract. We present a new method for solving total variation (TV) minimization problems in image restoration. The main idea is to remove some of the singularity caused by the non-di erentiability of the quantity jruj in the de nition of the TV-norm before we apply a linearization technique such as Newton's method. This is accomplished by introducing an additional variable for the ux quantity appearing in the gradient of the objective function, which can be interpreted as the normal vector to the level sets of the image u. Our method can be viewed as a primal-dual method as proposed by Conn and Overton [9] and Andersen [3] for the minimization of a sum of Euclidean norms. Experimental results show that the new method has much improved global convergence behavior than the primal Newton's method, with only a slight increase in cost per iteration.

1. Introduction. During some phases of the manipulation of an image some random noise and blurring is usually introduced. The presence of this noise and blurring makes dicult and inaccurate the latter phases of the image processing. The algorithms for noise removal and deblurring have been mainly based on least squares. The output of these L2-based algorithms will be a continuous function, which cannot obviously be a good approximation to our original image if it contains edges. To overcome this diculty a technique based on the minimization of the Total Variation norm subject to some noise constraints is proposed in [18], where it is also proposed a time marching scheme to solve the associated Euler-Lagrange equations. Since this method can be slow due to stability constraints in the time step size, a number of alternative methods have been proposed, [21], [8], [14]. One of the diculties in solving the Euler-Lagrange equations is the presence of a highly nonlinear and non-di erentiable term, which causes convergence diculties for Newton's method even when combined with a globalization technique such as a line search. The idea of our new algorithm is to remove some of the singularity caused by the non-di erentiability of the objective function before we apply a linearization technique such as Newton's method. This is accomplished by introducing an additional variable for the ux quantity appearing in the gradient of the objective function, which can be interpreted as the unit normal to the level sets of the image function. Our method can be viewed as a primal-dual method as proposed by Conn and Overton [9] and Andersen [3] for the minimization of a sum of Euclidean norms. Experimental  This is a revised version of the earlier UCLA Computational and Applied Mathematics technical

report 95-43 with the same title. y Department of Mathematics, University of California, Los Angeles, 405 Hilgard Avenue, Los Angeles, CA 90024-1555. E-mail address: [email protected]. Supported by grants NSF ASC-9201266 and ONR-N00014-96-1-0277. z Computer Science Department, Stanford University. E-mail: [email protected] work of this author was in part supported by the NSF :Grant # CCR-9505393. x Department of Mathematics, University of California, Los Angeles, 405 Hilgard Avenue, Los Angeles, CA 90024-1555 and Departament de Matematica Aplicada, Universitat de Valencia, Dr. Moliner, 50, 46100 Burjassot, Spain. E-mail address: [email protected]. Supported by DGICYT grants EX94 28990695 and PB94-0987. 1

results show that the new method has much improved global convergence behavior than the primal Newton method. The dramatic convergence improvement of this particular primal-dual implementation when compared to that experienced in linear programming problems seems to be due to the di erential nature of the operators involved in the objective function, and not only to the fact that the algorithm uses the dual variables. It is hoped that the new approach can be applied to other geometrybased PDE methods in image restoration, such as anisotropic di usion [16], ane invariant ows [19] and mean curvature ows [2], since the same singularity caused by jruj occurs in these methods as well. The organization of this paper is as follows: in section 2 we introduce the problem, the nonlinear equations associated to it and discuss how to solve them. In section 3 we present our new linearization technique for the (unconstrained) Tikhonov regularization form of the problem. In section 4 we introduce the constrained formulation for the problem and consider two variants of Newton's method for solving the rst order conditions. Finally, in section 5 we present some numerical results for the denoising case. 2. Total Variation Regularization. An image can be interpreted as either a real function de ned on a bounded and open domain of R2, , (for simplicity we will assume to be a rectangle henceforth) or as a suitable discretization of this continuousRimage. The1notation jjujj (u 2 L2 ( )) stands for the 2-norm of the function uP , jjujj =1 ( u2 dx dy ) 2 , juj (u = (u1; : : :; ud) a vector function) denotes the function ( d1 u2i ) 2 and jjy jj (y 2 Rm) denotes the 2-norm of the vector y . Our interest is to restore an image which is contaminated with noise and/or blur. The restoration process should recover the edges of the image. Let us denote by u0 the observed image and u the real image. The model of degradation we assume is Ku + n = u0, where n is a Gaussian white noise, of which we assume to know its level measured in the 2-norm, and K is a (known) linear blur operator (usually a convolution operator). In general, the problem Ku = z , with K a compact operator, is ill-posed, so it is not worth solving this equation (or a discretization of it), for the data is assumed to be inexact, and the solution would be highly oscillatory. But if we impose a certain regularity condition on the solution u, then the method becomes stable. We can consider two related techniques of regularization: Tikhonov regularization and noise level constrained regularization. Tikhonov regularization consists in solving the unconstrained optimization problem: 1 jjKu ? u jj2; (1) min R ( u ) + 0 u 2 for certain functional R which measures the irregularity of u in a certain sense and a suitably chosen coecient which will measure the tradeo between a good t to the data and a regular solution. Another approach consists in solving the following constrained optimization problem: min R(u) u (2) subject to jjKu ? u0 jj2 =  2; 2

Here we seek a solution with minimum irregularity from all candidates which match the known noise level. Examples of regularization functionals that can be found in the literature are, R(u) = jjujj; jjujj; jjru  rujj, where r is the gradient and  is the Laplacian. The drawback of using these functionals is that they do not allow discontinuities in the solution, and since we are interested in recovering features of the image, they are not suitable for our purposes. In [18], it is proposed to use as regularization functional the so-called Total Variation norm or TV-norm: Z Z q TV (u) = jruj dx dy = (3) u2x + u2y dx dy:



The TV norm does not penalize discontinuities in u, and thus allows us to recover the edges of the original image. For simplicity we use in this section the Tikhonov formulation of the problem. Hence the restoration problem can be written as Z  ? q min (4) u2x + u2y + 21 (Ku ? u0 )2 dx dy; u

that is 1 jjKu ? u jj2; jruj = qu2 + u2 : (5) min jj jr u j jj + 0 2 1 x y u 2 The Euler-Lagrange equation for this problem, assuming homogeneous Neumann boundary conditions, is:  ru  + K (Ku ? u ): 0 = ? r  jr 0 uj where K  is the adjoint operator of K with respect to the L2 inner product. This equation is degenerate due to the presence of the term 1=jruj. A commonly used technique to overcome this diculty is to slightly perturb the Total Variation norm functional to become: q jruj2 + ; (7) where is a small positive parameter. So now the problem is: Z q 2 + dx dy + 1 jjKu ? u0jj2; min (8) jr u j u 2

and the corresponding Euler-Lagrange equation is:

(6)

(9)

!

 0 = ? r  p ru2 jruj + + K (Ku ? u0) = g(u):

The main diculty that  this equation poses is the linearization of the highly nonlinear  r u term ?r  pjruj2+ . A number of methods have been proposed to solve (9). L. Rudin, S. Osher and E. Fatemi [18] used a time marching scheme to reach the steady state of the parabolic equation ut = ?g (u) with initial condition u = u0 : !

 ut = r  p ru2 jruj + ? K (Ku ? u0); u(x; 0) = u0(x):

3

This method can be slowly convergent due to stability constraints. C. Vogel and M. Oman [21] proposed the following xed point iteration to solve the Euler-Lagrange equation: !

k+1 ? r  pjrruukj2 + + K (Kuk+1 ? u0) = 0:

At each step, a linear di erential-convolution equation has to be solved. This method is robust but linearly convergent.   r u p Due to the presence of the highly nonlinear term r jruj2+ , Newton's method does not work satisfactorily, in the sense that its domain of convergence is very small. This is especially true if the regularizing parameter is small. On the other hand, if is relatively large then this term is well behaved. So it is natural to use a continuation procedure starting with a large value of and gradually reducing it to the desired value. T. Chan, R. Chan and H. Zhou proposed in [8] such an approach. Although this method is locally quadratically convergent, the continuation step can be dicult to control. 3. A new linearization based on   a dual variable. We propose here a better r u technique to linearize the term r jruj . This technique bears some similarity to techniques from primal-dual optimization methods and gives a better global convergence behavior than that of the usual Newton's continuation method. The method is based on the following simple observation. While the singularity and non-di erentiability of the term w = ru=jruj is the source of the numerical problems, w itself is smooth because it is in fact the unit normal vector to the level sets of u. The numerical diculties arise only because we linearize it in the wrong way. The idea of the new method is to introduce

ru w = jr uj

(10)

as a new variable and replace (9) by the following equivalent system of nonlinear partial di erential equations: jrujw ? ru = 0 (11) ? r  w + K (Ku ? u0) = 0: We can then linearize this (u; w) system, for example by Newton's method. This approach is similar to the technique of introducing a ux variable in the mixed nite element method [4]. For completeness we compare the linearization of the u system: (12)

"

!

to the linearization of the (w; u)-system: (13)

#

uruT )r + K K ) u = ?g(u); ? r  jr1uj (I ? rjr uj2

"

jruj ?(I ? wjrruujT )r ? r K K

#"

4

#

"

#

w = ? f (w; u) : u g(w; u)

Equation (13) can be solved by rst eliminating w and solving the resulting equation for u: " ! # T 1 w r u  ? r  (14) (I ? )r + K K u = ?g (u)

jruj

jruj After u is obtained we can compute w by: ruT )ru ? w + ru w = jr1uj (I ? wjr uj jruj

We note that the cost per iteration of our new linearization technique is only slightly higher than for the standard Newton's method (12), because the main cost is the solution of the di erential-convolution equations (12) and (14) for u. The motivation is that the (w; u) system is somehow better behaved than the u system. Although at this point we do not have a complete theory to support this, we will now give a scalar example that can explain the better convergence behavior of the new approach. We compare Newton's method applied to pthe equivalent equations f (x) = a ? pxx2+ = 0 (which resembles (10)) and g(x) = a x2 + ? x = 0 (which p resembles w jruj2 + ?ru = 0), where a  1 and  0. In Fig 1 we can see that g looks more \linear" that f over much of the x-axis. In particular, if we start Newton's method with whatever initial guess not very close to the actual solution, it will diverge for f but it will converge for g . This is con rmed by the numerical results shown in Table 1. We believe that the reason why the primal-dual algorithm presented here shows such a dramatic convergence improvement over the standard Newton's method is precisely this better linearization. −3

1

x 10

0.8

g(x)=0.9999*sqrt(x^2+1e−4)−x

0.6

f(x)=0.9999−x/sqrt(x^2+1e−4)

0.4 0.2 0 f(x) −0.2 −0.4 −0.6 g(x) −0.8 −1 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Fig. 1. Plot of f (x) and g(x). The y-axis has been scaled by a factor of 10?3 .

3.1. Dual formulation. The de nition (3) is not valid when u is non-smooth. The general de nition of the Total Variation of a general (not necessarily smooth) u 5

x0 n

1 2 3 4 5

Newton's iteration for g (x) = 0 Newton's iteration for f (x) = 0 10?1 10?2 10?3 10?4 10?5 10?1 10?2 10?3 10?4 10?5 7 6 4 3 5 * 9 6 6 * 6 5 2 4 6 * 8 4 * * 6 4 3 5 6 10 7 5 * * 5 4 3 5 7 9 6 * * * 5 3 4 6 7 8 5 * * * Table 1

Comparison of the number of iterations required by Newton's method to solve f (x) = 0 and g(x) = 0, for a = 0:9999, for di erent (horizontally) and di erent initial guesses x0 (vertically). A  means that the corresponding iteration failed to converge.

is as follows (see[11]): (15)

Z

TV (u) = maxf u r  w dx dy : w = (w1 ; w2); wi 2 C01 ( ); jwj1  1g

Note that we can recover de nition (3) when u is smooth by simply using the CauchySchwartz inequality and w = jrruuj . Therefore, another formulation for the Total Variation restoration problem is (16)

Z

min max u jwj1



u r  w dx dy + jjKu ? u0 jj2:

It can be formally shown that the rst order conditions for this problem lead to equation (13). 4. The Discrete and the Constrained Formulation. In this section we switch to the discrete formulation. We will use the notation of [9]. We rst consider the discrete Tikhonov regularization and then the noise level constrained formulation. We can discretize directly (12) and (14), but instead we will discretize the minimization problem (8) and derive rigorously the discrete primal and primal-dual algorithms. 4.1. Tikhonov regularization. The images u (resp. u0) are assumed to be discretized in m = mh  mv (mh , horizontal size, mv vertical size) square pixels of dimension h  h which we order row wise in a vector y (resp. y0 ). We use forward di erences with Neumann boundary condition for the discretization of ru, and a standard quadrature rule for the discretization of the blurring operator. The discretization of (8) we obtain in this way is: (17)

min y

m X i=1

i.e. (18)

min y h

h2

r

jj h1 ATi yjj2 + + 12 h2jjBy ? y0jj2;

m q X i=1

jjATi yjj2 + h + 21 jjBy ? y0jj2

where ATi y = (yi+1 ? yi ; yi+mh ? yi )T for interior pixels and modi ed accordingly to Neumann boundary conditions for boundary pixels, h = h2 , B is a discretization of 6

the blurring operator K and h = =h. If no confusion can arise, we will drop the subscript in h and h . q P Denote F (y; ) = mi=1 jjATi y jj2 + + 12 jjBy ? y0 jj2 and let us assume that > 0 and B invertible. This latter condition can be weakened to only requiring that B does not annihilate constants. It can then be proved that for a xed  0, F is continuous, strictly convex, coercive (i.e. lim F (y; ) = 1; jjy jj ! 1), decreasing with respect to and bounded below. With these hypothesis it can be seen that there exists a unique solution y for min F (y; ) 8  0 and that y ! y0 , ! 0 (see [1]). y

4.1.1. Primal Newton method. The rst order (necessary and sucient) con-

dition for problem (18) is:

g(y) 

(19)

X

AiATi y + B T (By ? y ) = 0 0 i

q

where i = jjATi y jj2 + , which corresponds to a discretization of (9). If we use Newton's method to solve this system then we have to solve at each step the linear system ?

C y = ? AE ?1AT y + B T (By ? y0)

(20) where



C = AE ?1FAT + B T B i h A = A1 : : : Am  ? E = diag iI2 i=1;:::;m T T Ai  ? F = diag I2 ? Ai yy i2 i=1;:::;m I2 = 2  2 identity matrix:

(21)

Note that (19) and (20) correspond to discretizations of (9) and (12). We call this approach the primal Newton method. 4.1.2. Primal-dual Newton method. This approach can be regarded as introducing an auxiliary ux variable as in (11) before we linearize the problem using Newton's method. Following [9] we derive a dual formulation for (18). We use the fact that jjv jj = jjmax v T w to write: wjj1



(22)

m X i=1

jjATi yjj + 21 jjBy ? y0jj2 =

T Ax + 1 jjBy ? y jj2; max y 0 jjx jj1 2

(23)

i

where xi 2 R2, i = 1; : : :; m. Hence (24)

min y

m X i=1

max y T Ax + 12 jjBy ? y0 jj2: jjATi yjj + 21 jjBy ? y0jj2 = min y jjxi jj1 7

Now, the function y T Ax + 12 jjBy ? y0 jj2 is convex in y and concave (actually linear) in x = (x1 ; : : :; xm) and fx : jjxijj  1g is bounded. Therefore, by [17, Cor. 37.3.2] we can interchange the min and the max to obtain: (25)

min y

m X i=1

T Ax + 1 jjBy ? y jj2: min y jjATi yjj + 21 jjBy ? y0jj2 = jjmax 0 y xi jj1 2

The solution to the problem min yT Ax + 21 jjBy ? y0 jj2 y

(26)

can be obtained by solving the rst order conditions:

Ax + B T (By ? y0) = 0

(27) which gives

y = B ?1 (y0 ? B ?T Ax)

(28) and therefore (29)

2 T Ax + 1 jjBy ? y jj2 = y T B ?T Ax ? jjB ?T Axjj2: min y 0 0 y 2 2

The dual problem can be written as: (30)

max

jjxi jj1 Ax+BT (By?y0 )=0

yT Ax + 21 jjBy ? y0jj2

or, equivalently: (31)

max y T B ?T Ax ? 2 jjB ?T Axjj2; jjx jj1 0 2

i

which is a quadratic problem. The variables yi will be called the primal variables and xi the dual variables. Instead of aiming to solve directly the dual problem we consider conditions for the duality gap, i.e. the di erence between the primal (22) and the dual (30) objective functions, to be 0. The equality: X X xTi ATi y + 12 jjBy ? y0jj2 = jjATi yjj + 12 jjBy ? y0jj2; i i together with the condition jjxijj  1, is equivalent to requiring xTi ATi y = TjjATi y jj, 8i, by the Cauchy-Schwartz inequality. This is in turn equivalent to xi = jjAATii yyjj 8i such that ATi y 6= 0 and can be expressed as jjATi y jjxi ? ATi y = 0 8i. Therefore the following equations yield the solution to the primal and dual problems: (32)

jjATi yjjxi ? ATi y = 0 Ax + B T (By ? y0 ) = 0 jjxijj  1 8

or, using the notation introduced in (21),

Ex ? AT y = 0 Ax + B T (By ? y0 ) = 0 jjxijj  1

(33)

This system is undetermined q if ATi y = 0 for some i. To remedy this, we replace the term jjATi y jj in (32) by i = jjATi y jj2 + for some positive . This has the e ect of transforming (32) into the rst order conditions for (18) and > 0 (this can be readily seen by eliminating xi ). Since we know that the solution of (18) tends to the solution of (18) for = 0, when ! 0, we can solve (33) with E = diag(iI2 ) for small enough and get a reasonably good approximation to the solution. The linearization of (33) yields the linear system: (34)

"

#"

#

"

#

E ?F AT x = ? Ex ? AT y T A B B y Ax + B T (By ? y0 ) ;

where (35)

T T  ? F = diag I2 ? xi y Ai ; i

which can be solved as follows: (36) (37)

C y = ?( AE ?1AT y + B T (By ? y0)) x = ?x + E ?1AT y + E ?1F AT y:

Note that equation (24) corresponds to a discretization of (16) and equation (34) to a discretization of (13). Note also that the right hand side of (36) is the same as that of (20). Due to the size of the matrix C we use iterative methods to solve (36). The conjugate gradient method cannot be applied to the matrix of (36) since it is not symmetric. However, from the fact that F ! F as we approach the solution, we have C ! C . Furthermore, it can be proved that C is nonsingular provided that jjxijj  1 8i ( > 0 and B is invertible by assumption) and in this case C is positive de nite (i.e. its symmetrization C~ = 12 (C + C T ) is positive de nite). It is then advisable to consider a quasi-Newton approach, replacing C in (36) by its symmetrization C~ . Furthermore, since C~ converges locally to C at least linearly, it follows from [13, 5.4.1] that the convergence of the resulting quasi-Newton method is quadratic. Hence, to ensure the feasibility of this method we have to impose the condition jjxijj  1 8i throughout the process. This can be achieved inexpensively by choosing a step length s > 0 such that s =  supf : jjxi + xi jj < 1 8ig, where  2 (0; 1) and it is assumed that jjxijj < 1 8i. Actually, our experience reveals that this choice of step length for x already produces an \almost" globally convergent algorithm, even for small . However, to ensure global convergence, a step length procedure on the primal variables y can also be used. Since C~ is symmetric positive de nite, y will be a descent direction for the primal objective function. Hence a line search based on a sucient descent of the objective function could be used to ensure global convergence. However, we have found that using jjg jj2 9

as merit function has given better results, where g is the gradient of the objective function as given in equation (19). For the denoising case, the matrices C and C~ correspond to elliptic-like operators, so we can use standard preconditioners, as ILU or multigrid, for the conjugate gradient method. However, for the case of deblurring, these matrices are, in general, dense. The preconditioners for these matrices that have been proposed in the literature include cosine transform preconditioners [7] and ADI-like preconditioners [21]. 4.2. Noise level constraints regularization. The development of the previous subsection can be readily extended to the constrained formulation (2). The discretization of the noise level constrained problem gives: m r 1 X min h2 jj AT y jj2 +

h i subject to 21 (h2 jjBy ? y0 jj2 ?  2 ) = 0; y i=1

which can be rewritten as:

min y

(38)

m q X i=1

jjATi yjj2 + h

subject to 1 (jjBy ? y0 jj2 ? h2 ) = 0: 2 T where Ai y , B and h are as de ned in the previous subsection and h = =h. Again, if no confusion can arise, we will drop the subscripts in h and h . In the same fashion as for the Tikhonov formulation, if we further assume that no constant vector y veri es jjBy ? y0jj <  , then it can be seen that (38) has a unique solution y and Lagrange multiplier  > 0 such that y ! y0 and  ! 0 as ! 0. 4.2.1. Primal Newton method. The rst order conditions for the solution of (38) are:

AiATi y + B T (By ? y ) = 0 0 i 1 (jjBy ? y jj2 ?  2) = 0 0 2 X

(39) (40) q

where i = jjATi y jj2 + . If we use Newton's method to solve this system, then we have to solve at each step the following linear system: 2 3 #" # " ?1 T T C By ? y0 y = ? 4AE 1 A y + B (By ? y0 )5 2 2 (By ? y0 )T 0  2 (jjBy ? y0 jj ?  )

where C = AE ?1FAT + B T B , with notation as in (21). 4.2.2. Primal-dual Newton method. We follow here the approach of [3] to derive the dual problem. We consider the following equivalent formulation of (38)

(41)

min y

m X i=1

(jjzijj2 + ) 21

subject to

8 < :

ATi y ? zi = 0 1 (jjBy ? y jj2 ? 2 ) = 0 : 0 2 10

The Lagrangian for this problem is: 1 X n n X 2 2 L(y; z; x; ) = (jjzijj + ) + xTi (ATi y ? zi) + 12 (jjBy ? y0jj2 ? 2); i=1

i=1

where x and  are Lagrange multipliers (x = (xi ); xi 2 R2; z = (zi ); zi 2 R2). Hence, the Karush-Kuhn-Tucker conditions for (41) are

@ L = X A x + B T (By ? y ) = 0 i i 0 @y zi @L = (43) 2 @xi (jjzijj + ) 12 ? xi = 0 (i = 1; : : :; n) @ L = AT y ? z = 0 (i = 1; : : :; n) (44) i i @zi @ L = 1 (jjBy ? y jj2 ? 2) = 0: (45) 0 @ 2 Using (44), we eliminate zi from (43) and multiply the second set of equations by q T jjAi yjj2 + to obtain: (42)

X

Aixi + B T (By ? y0) = 0 q ATi y ? jjATi yjj2 + xi = 0(i = 1; : : :; n) 1 (jjBy ? y jj2 ?  2 ) = 0: 0 2

(46) (47) (48)

We now apply Newton's method to this system. The system we have to solve at each Newton's step is the following: 2

(49)

6 4

A B T B w E ?F AT 0 0 wT 0

32 76 54

3

2

3

Ax + B T w 7 x 6 7 y 5 = ? 4 Ex ? AT y 5 ; 1 2 2  2 (jjwjj ?  )

where w = By ? y0 and F = diag(I2 ? xi y Ai ). By rst eliminating x, we can solve i this system as follows: T

"

(50)

#"

#

"

C w y = ? AE ?1AT y + B T w 1 2 2 wT 0  2 (jjwjj ?  ) ? 1 T ? 1 T x = ?x + E A y + E F A y;

#

where C = AE ?1FAT + B T B . The same issues as in subsubsection 4.1.2 regarding the use of a quasi-Newton approach and preconditioners also apply in this case. The preconditioning in this case has the added diculty of dealing with KKT matrices. 5. Numerical results. In these experiments we will use denoising and the unconstrained formulation. Our rst experiment consists in the comparison of the primal Newton and the primal-dual Newton methods under the following circumstances: 11

1. Continuation on and no line search. 2. Continuation on and line search on y . 3. No continuation on and use line search on y . 4. No continuation and use line search on y and x for the primal-dual method. The pseudocodes for the continuation and the Newton's algorithms are given in Figures 2 and 3 respectively.

u = u0 , succ = = jjru0jj21 , red < 1 while > target 1. [unew ,success]=newton(u; u0; )

2. if success (a) succ = (b) Decrease by a factor red (c) Decrease reduction factor red (d) u = unew 3. else (a) Increase (b) Increase reduction factor red . Fig. 2. Pseudo-code for continuation algorithm.

for i=1:#iterations 1. Compute update direction (y; x) for primal and dual variables from (36) and (37) (we use ILU-CG with relative residual tolerance cgtol). 2. If y is not a descent direction for the merit function (we use jjg jj2 as merit function), then (a) Decrease cgtol (b) Continue (go to for) 3. else (a) Compute primal step length sp by sucient decrease line search procedure (we use as merit function jjg jj2). 4. Compute dual step length sd = 0:9 supfs : jjxi + sxi jj < 1 8ig. 5. y = y + sp y 6. x = x + sd x Fig. 3. Pseudo-code for primal-dual Newton algorithm with line search. The primal Newton algorithm is basically identical, with the exception that the dual variables x and dual updates x do not appear and step 1 solves (20).

The original image, which is 256  256 pixels and has dynamic range [0; 255], appears in Fig 4. A Gaussian white noise with variance  2  1200 (h2 =  2  2562) is added to it, resulting in the image displayed in Fig 5, with SNR = jju? ujj  1. Fig 6 depicts the solution obtained by the primal-dual Newton method. We have set the parameter in the Tikhonov formulation to the inverse of the Lagrange multiplier yielded by a previous run of the constrained problem solver, in this case = 1:18. The parameter h has been set to 0:01. For the primal-dual method we have used the quasi-Newton approach mentioned in subsubsection 4.1.2. Furthermore, we have used truncated versions of Newton's algorithm, based on the 12

conjugate gradient method with incomplete Cholesky as preconditioner. The stopping criterion for the (outer) Newton's iteration is a relative decrease of the nonlinear residual by a factor of 10?4. The stopping criterion for the n-th inner linear iteration is a relative decrease of the linear residual by a factor of n, where we follow the suggestion of [13, Eq. 6.18] and set (51)

(

n = 0:1 if n = 0; 2 min(0:1; 0:9jjgnjj =jjgn?1jj2)

where gn denotes the gradient of the objective function, i.e. the left hand side of (19), at the n-th iteration. In Table 2 we compare the primal-dual and the primal versions of Newton's method for the experiments described above. The conclusions that can be drawn from this experiment are:  The most crucial factor for the primal-dual method is controlling the dual variables via the step length algorithm appearing in Fig. 3, line 4. In fact, our experience is that this algorithm with the dual step length is globally convergent for the parameters and in a reasonable range. A line search for the primal variables almost always yields unit step lengths.  The primal-dual method with the dual step length algorithm does not need continuation to converge, although using it might be slightly bene cial in terms of work.  The primal-dual method with the dual step length has a much better convergence behavior than the primal method. In our second experiment, we compare the primal-dual Newton, xed point and time marching methods. For the primal-dual Newton method, we use the step length algorithm for the dual variables, no continuation and the same parameters as in the previous experiment. The same parameters are used for the xed point method, except that we have used a xed linear relative residual decrease n = 0:1 (it is in this case optimal according to our experience). We have used a line search based on sucient decrease for the time marching method. The stopping criterion for the time marching method is based on the iteration count since we have not been able to achieve the prescribed accuracy in a reasonable amount of time. In Figs. 7, 8 and 9 we plot the convergence history of this experiment. The conclusions we draw from this experiment are:  The primal-dual algorithm is quadratically convergent, whereas the others are at best linearly convergent.  The primal-dual algorithm behaves similarly to the xed point method in the early stages, but in a few iterations can attain high accuracy.  The cost per iteration of the primal-dual method is between 30 and 50 per cent more than for the xed point iteration. The memory requirements roughly satisfy this as well. Acknowledgements. We would like to thank Andy Conn and Michael Overton for making their preprint [9] available to us, and Jun Zou for many helpful conversations on mixed nite elements with the rst author. The third author wants to heartily thank Paco Arandiga, Vicente Candela, Rosa Donat and Antonio Marquina for their encouragement and continuous support. REFERENCES 13

Fig. 4. Original image, 256  256 pixels.

Fig. 5. Noisy image, SNR 1, jjgjj = 2:07.

[1] R. Acar and C. R. Vogel, Analysis of total variation penalty methods for ill-posed problems, Inverse Problems, 10 (1994), pp. 1217{1229. [2] L. Alvarez, P. Lions, and J. Morel, Image selective smoothing and edge detection by nonlinear di usion II, SIAM J. Numer. Anal., 29 (1992), pp. 845{866. [3] K. D. Andersen, Minimizing a sum of norms (large scale solution of symmetric positive de nite linear systems), PhD thesis, Odense University, 1995. [4] F. Brezzi, A survey of mixed nite element methods, in Finite elements, ICASE/NASA LaRC, Springer, 1988, pp. 34{49. [5] P. H. Calamai and A. R. Conn, A stable algorithm for solving the multifacility localtion problem involving euclidean distances, SIAM Journal on Scienti c and Statistical Computing, 1 (1980), pp. 512{526. [6] , A second-order method for solving the continuous multifacility location problem, in Numerical analysis: Proceedings of the Ninth Biennial Conference, Dundee, Scotland, G. A. Watson, ed., vol. 912 of Lecture Notes in Mathematics, Springer-Verlag, 1982, pp. 1{25. [7] R. H. Chan, T. F. Chan, and C.-K. Wong, Cosine transform based precontioners for total variation minimization problems in image processing, Tech. Report 95-23, University of California, Los Angeles, 1995. [8] R. H. Chan, T. F. Chan, and H. M. Zhou, Continuation method for total variation denoising problems, Tech. Report 95-18, University of California, Los Angeles, 1995. [9] A. R. Conn and M. L. Overton, A primal-dual interior point method for minimizing a sum of euclidean norms. preprint. [10] D. Dobson and F. Santosa, Recovery of blocky images from noisy and blured data, Tech. Report 94-7, Center for the Mathematics of Waves, University of Delaware, 1994. [11] E. Giusti, Minimal Surfaces and Functions of Bounded Variations, Birkhauser, 1984. [12] G. Golub and C. van Loan, Matrix computations, 2nd ed., The Johns Hopkins University Press, 1989. 14

Fig. 6. Denoised image, 13 Newton's iterations, 58 CG iterations, jjgjj = 5:57  10?5 .

primal-dual Newton primal Newton NWT CG NWT CG continuation, no line search 25 186 70 265 continuation, line search on y 25 187 53 210 continuation, line search on y &x 13 51 53 210 no continuation, line search on y &x 12 58 Not converged Table 2

Comparison of primal and primal-dual Newton methods.

[13] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations, vol. 16 of Frontiers in Applied Mathematics, SIAM, 1995. [14] Y. Li and F. Santosa, An ane scaling algorithm for minimizing total variation in image enhancement, Tech. Report 12/94, Center for theory and simulation in Science and Engineering, Cornell University, 1994. [15] M. E. Oman, Fast multigrid techniques in total variation-based image reconstruction. to appear in the Preliminary Proceedings of the 1995 Copper Mountain Conference on Multigrid Methods. [16] P. Perona and J. Malik, Scale space and edge detection using anisotropic di usion, IEEE Trans. Pattern Anal. Mach. Intelligence, 12 (1990), pp. 629{639. [17] T. Rockafellar, Convex analysis, SIAM, XXX. [18] L. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D, 60 (1992), pp. 259{268. [19] G. Sapiro and A. Tannenbaum, Area and length preserving geometric invariant scale-space, in Proc. 3rd European Conf. on Computer Vision, Stockholm, Sweden, May 1994, vol. LNCS, vol. 801, 1994, pp. 449{458. [20] C. R. Vogel, A multigrid method for total variation-based image denoising. in Computation and Control IV, conference proceedings to be published by Birkhauser. [21] C. R. Vogel and M. E. Oman, Iterative methods for total variation denoising, SIAM J. Sci. Statist. Comput., (to appear).

15

(a)

1

10

Primal−dual Fixed Point Time marching

0

10

−1

||gradient||

10

−2

10

−3

10

−4

10

−5

10

0

10

20

30

40

50 Iterations

60

70

80

90

100

Fig. 7. Plot of the L2 -norm of the gradient g(u) of the objective function versus iterations for the di erent methods. (c)

3

10

Primal−dual Fixed Point Time marching

2

10

Difference true image

1

10

0

10

−1

10

−2

10

−3

10

0

10

20

30

40

50 Iterations

60

70

80

90

100

Fig. 8. Plot of the L2 -norm of the di erence between the current iterate and the solution for the problem computed by Newton's method with high accuracy versus iterations for the di erent methods. (d) 0

10

−1

% pixels

10

Primal−dual Fixed Point Time marching

−2

10

−3

10

−4

10

0

10

20

30

40

50 Iterations

60

70

80

90

100

Fig. 9. Plot of # pixels which di er more than .001 (relatively) from the solution for the problem computed by Newton's method with high accuracy versus iterations for the di erent methods.

16