A parallel incompressible two-phase flow solver for complex geometries

Report 2 Downloads 52 Views
CS 267 Final Project: A parallel incompressible two-phase flow solver for complex geometries Jeffrey J. Donatelli Abstract We present a fully 3D coupled level set projection method in arbitrary logically rectangular geometries for studying two phase immiscible incompressible flow in enclosed regions. We discuss our numerical methodology, parallelization of our algorithm, and current results.

1

Introduction

The goal of this work is to design robust numerical algorithms for studying complex 3D interface evolution, including instabilities, occurring in two-phase flow under different configurations, geometries, input conditions, and fluid parameters. Our model uses a body-fitted coordinate system and includes viscosity and density jumps and surface tension. We solve a version of the incompressible two-phase Navier-Stokes equations coupled with a level set method to track the evolution of the phase boundary. Our approach uses a second order finite element variable density projection method to enforce the incompressibility constraint. The level set function is periodically reinitialized as a signed distance function via a fast marching method combined with a tricubic interpolation scheme, to help conserve mass. The sharp coefficient jumps across the phase boundary are dealt with by smoothing the density and viscosity over a few grid cells near the phase boundary. This is all done on a multi-stage mesh to prevent issues with poorly shaped grid elements due to domain stretching. Our testbed application is a problem in the numerical simulation of ribbing instabilities in coating rollers. Coating roller technology is used to deposition thin films using horizontal rollers, and is important in such processes as resin deposition and coating steel sheets. In these phenomena, the uniformity of the coating layer depends on the capillary number, nip gap diameter, roller speeds, and the constitutive properties of the fluid/material. Surface ribbing is a well-known and often undesirable phenomena in which the uniform layer is disturbed in these coating flows.

1

We have been developing a numerical simulation of this process in order to gain a better understanding of the ribbing behavior. This allows us to examine small scale behavior, on the order of micrometers, near the nip gap, which is not possible to observe experimentally. Our simulations have proven to be successful at recreating the ribbing behavior and show a good deal of agreement with experimental observations.

Figure 1: Surface ribbing in coating flow Left:Numerical Simulation Right:Experimental [7] The serial runtimes for these simulations are on the order of weeks when using a relatively coarse mesh on only a small portion of the desired physical domain. However, there is important behavior which we wish to capture at both smaller and larger scales. Therefore, an efficient parallel framework is necessary in order to obtain a complete description of the nature of the ribbing phenomenon.

2 2.1

Theoretical Background Equations of Motion

We solve the incompressible two-phase Navier-Stokes equations along with an advection equation for the level set function and an Eikonal equation to maintain the level set function as an approximate signed distance function: Momentum: ut +u·∇u = −

1 1 1 ∇p+ ∇·(2µ(φ)(∇u+(∇u)T ))− κ(φ)δ(φ)∇φ ρ(φ) ρ(φ)Re ρ(φ)W e

Incompressibility: ∇·u = 0 2

Level Set advection: φt +u·∇φ = 0 Distance function constraint: |∇φ| = 1 The density and viscosity are smoothed over a few grid cells via an approximate Heaviside function:  if φ ≥ ²,  1 ρ2 πφ 1 ρ2 ( + 1 + (1 − ρ1 )sin( 2² )) if |φ| ≤ ², ρ(φ) =  ρ22 ρ1 if φ ≤ −², ρ1 with a similar definition for µ, where ρi , µi are the density and viscosity for the two fluids and ² is distance smoothed over. The Reynold’s number and Weber’s number are Re =

ρ1 U L µ1

ρ1 U 2 L , σ where U is the characteristic velocity, L is the characteristic length, and σ is the surface tension. We =

The curvature term is defined for every level set of φ: µ ¶ ∇φ κ(φ) = ∇ · |∇φ| δ(φ) is the Dirac delta function and is computed in the surface tension term in the following form δ(φ)∇φ = ∇H(φ), where H is an approximate Heaviside function.

2.2

Mesh

Our method uses a boundary fitted hexahedral mesh to conform to the physical boundary. Multiple stages, i.e. sections with different number of grid points in certain directions, are used to prevent grid elements from overstretching. The computational domain Ξ is a Cartesion mesh for each stage, which is then mapped over to the physical domain X by Φ : Ξ → X. Here we take x, y, z to be coordinates for the physical domain and ξ, η, ζ to be coordinates for the computational domain.

3

Figure 2: 2D slice of a multi-stage hexahedral mesh Derivatives are calculated on the physical domain by applying the chain rule through the transformation map. In particular, we make use of the transformation matrix and the Jacobian: J = det∇Ξ Φ T = J[∇Ξ Φ]−1 We define the transformed convection velocity to be ¯ = Tu u In this setting, gradients for the physical domain are then computed as: ∇f =

1 T T ∇Ξ f J

Furthermore, we use a staggered grid. The velocity, density, viscosity, and level set function are at the center of each grid cell while the pressure is located at cell corners.

3

Numerical Methodology

We use a semi-implicit discretization for the momentum equation. First, the convective term and level set advection are computed with a second order Godunov upwinding scheme. Then, the following is solved for the preemptive velocity u∗ : 1 u∗ − un 1 + [¯ u · ∇Ξ u]n+1/2 = − n+1/2 T T ∇Ξ pn−1/2 +(Viscosity)n+∗ +(Surfacetension)n+1/2 dt J ρφ J where

¸ ¡ T ¢ 1 n+1/2 T n+∗ n+∗ T ∇Ξ · µ(φ )(T T ∇Ξ u + T T ∇Ξ u ) , (Viscosity) = Jρφn+1/2 Re J µ ¶ 1 T T ∇Ξ φ n+1/2 (Surfacetension) = 2 ∇Ξ · T T T T ∇Ξ H(φn+1/2 ), J ρ(φn+1/2 W e) |T ∇Ξ φ| n+∗

and un+∗ =

1

·

un +u∗ . 2

4

The preemptive velocity u∗ is then projected to onto the set of divergence-free vector fields and the pressure is updated through a variable density projection method, which yields the divergence-free velocity field, un+1 . Furthermore, we occasionally reinitialize the level set function as a signed distance function via a fast marching method, e.g. after every 50 time steps.

3.1

Godunov upwinding

In order to discretize the convection and advection term we use a second order Godunov upwinding scheme which is based on work in [3, 4, 6] n+1/2

n+1/2 [¯ u·∇Ξ u]i,j,k

=

n+1/2

u¯i+1/2,j,k + u¯i−1/2,j,k 2 n+1/2

+

n+1/2 v¯i,j+1/2,k n+1/2 n+1/2 (ui+1/2,j,k −ui−1/2,j,k )+

n+1/2

+ v¯i,j−1/2,k 2

n+1/2

n+1/2

(ui,j+1/2,k −ui,j−1/2,k )

n+1/2

w¯i,j,k+1/2 + w ¯i,j,k−1/2

n+1/2

n+1/2

(ui,j,k+1/2 − ui,j,k−1/2 )

2 We form the cell surface velocities by extrapolating from each cell center and then using an upwinding procedure. For example, 1 ∆t n n+1/2,L ui+1/2,j,k = uni,j,k + unξ,i,j,k + u 2 2 i,j,k 1 ∆t n ∆t ∆t n = uni,j,k + ( − u¯i,j,k )unξ,i,j,k − (¯ v uη )ni,j,k + F 2 2Ji,j,k 2Ji,j,k 2 i,j,k We use a monotonicity-limited central difference scheme for the normal slopes. For example, if (ui+1,j,k − ui,j,k )(ui,j,k − ui−1,j,k ) ≤ 0 then uξ,i,j,k = 0 and otherwise uξ,i,j,k = min(|ui+1,j,k − ui−1,j,k |, 2|ui+1,j,k − ui,j,k |, 2|ui,j,k − ui−1,j,k |) ∗ sgn(ui+1,j,k − ui−1,j,k ) The tangential derivatives are evaluated by an upwinding discretization. Advective velocities are then decided by:  n+1/2,L n+1/2,L n+1/2,L n+1/2,R   u¯i+1/2,j,k if u¯i+1/2,j,k > 0 and u¯i+1/2,j,k + u¯i+1/2,j,k > 0 n+1/2 n+1/2,R n+1/2,R n+1/2,L n+1/2,R u¯i+1/2,j,k = u¯i+1/2,j,k if u¯i+1/2,j,k < 0 and u¯i+1/2,j,k + u¯i+1/2,j,k < 0   0 otherwise The surface velocities and level set values are determined by advective velocities:  n+1/2,L n+1/2  if u¯i+1/2,j,k > 0  ui+1/2,j,k  n+1/2,R n+1/2 n+1/2 ui+1/2,j,k if u¯i+1/2,j,k < 0 , ui+1/2,j,k =  n+1/2,L n+1/2,R   ui+1/2,j,k +ui+1/2,j,k if u¯n+1/2 = 0 i+1/2,j,k

2

with a similar equation with u replaced with φ. 5

3.2

Diffusion

In solving for the diffusion we must invert (I −

dt ∇ · (µ(∇u + (∇u)T )) 2Reρ

which we discretize as = (I −

dt (LR − LL + LU − LD − LB − LF ) 2JReρ

where

µR [(TR TRT ∇Ξ,R u)(1, :) + (TR TRT ∇Ξ u)(:, 1)] JR with similar definitions for the remaining terms. In the above definition the subscripts, e.g. µR , denote interpolation to the 6 different cell surfaces. LR =

We solve the system simply through Gauss-Seidel iterations. Since the time step is so small, see next section, the condition number is close to 1 and, thus, only a few iterations are required for convergence, e.g. 1-3.

3.3

Projection

We use a second order finite element approximate projection introduced by Bell et al., [2],to enforce the incompressibility constraint and update the pressure . In particular, we solve Z Z dt ∇p · ∇ψ dx = u∗ · ∇ψ dx ρ which is equivalent to Z Z Z Z Z Z dt T T T ∇Ξ p · T ∇Ξ ψ dξdηdζ = u∗ · T T ∇Ξ ψ dξdηdζ ρJ Here, u∗ is the velocity obtained from inversion of the viscosity term and ψ is a test function. We take p and ψ to be trilinear on each cell. The associated basis functions, Ni,j,k , are continuous on each cell surface and satisfy Ni,j,k = δi,j,k , where δ is the Kronecker delta function. At the stage boundaries our shape functions, for interpolating p, and weighting functions, for interpolating ψ, have different supports (See figure below). Furthermore, we use a trilinear interpolation of the coordinates in order to compute the Jacobian and transformation matrix for each cell. This results in a 27 point stencil at the interior of each stage and a 33 point stencil at stage boundaries. Each integral is evaluated using a 8 point Gaussian quadrature scheme on a Cartesian reference element. Due to the hanging nodes at the stage boundary, our elements become non-isoparametric and we, thus, lose symmetry in the resulting linear system.

6

Figure 3: Support for the shape functions, red, and the weighting functions, blue, at the stage boundary. Open circles are hanging nodes. The system is solved using full multigrid. We use 8-color Gauss-Seidel for the relaxation, as the associated graph has a colorability of 8. In order to preserve the effects from coefficient jumps at coarser levels we utilize special smoothing and interpolation operators similar to the methods used in [1]. In particular, we smooth the density differently along each direction in the computational domain. Each operation uses an arithmetic average along the off directions and a harmonic average along the collinear direction. For the fine level we have: 1 σξ1 = ση1 = σζ1 = ρ For all multigrid levels ` ≥ 2 we have: `,ξ σi/2,j/2,k/2

= (2(

1 X

`,ξ )−1 σi,j+m,k+n

1 X

+ 2(

m,n=0

m,n=0

The interpolation operator is given by: i, j, k odd `+1 e`+1 i,j,k = e i+1 , j+1 , k+1 2

7

2

2

`,ξ )−1 )−1 σi+1,j+m,k+n

i even, j, k odd 0 X

e`+1 i,j,k =

`+1,ξ ( σi−1,j+m,k+n )e`+1 i−1,j,k m,n=−1 0 X

0 X

+(

`+1,ξ σi,j+m,k+n )e`+1 i+1,j,k

m,n=−1

`+1,ξ σi+m,j+n,k+p

m,n,p=−1

i, j even, k odd ( e`+1 i,j,k =

0 X

`+1,ξ σi−1,j+m,k+n )e`+1 i−1,j,k + (

m,n=−1

0 X

`+1,ξ σi,j+m,k+n )e`+1 i+1,j,k + (

0 X

`+1,η σi+m,j−1,k+n )e`+1 i,j−1,k + (

0 X

`+1,ξ σi+m,j+n,k+p +

0 X

`+1,η σi+m,j+1,k+n )e`+1 i,j+1,k

m,n=−1

m,n=−1

m,n=−1

0 X

`+1,η σi+m,j+n,k+p

m,n,p=−1

m,n,p=−1

with similar definitions for the remaining cases. We then use the corresponding σ, in place of ρ1 , for the derivatives of the shape functions, i.e. the integrand for the left hand side becomes dt T ξ T (σ pξ , σ η pη , σ ζ pζ )T · T T ∇Ξ ψ. J We stop coarsening at the point when one of the stages would disappear. Since single scale iterative methods demonstrate poor convergence in the presence of strongly discontinuous coefficients, which are maintained on the coarser levels, we use a direct solver for the bottom level. Since the bottom matrix is the same throughout the projection step, we can perform one LU decomposition at the beginning and then use triangular solves when we reach the bottom level. Once we obtain the pressure, we subtract its gradient from the preemptive velocity u∗ to obtain the divergence-free velocity field un+1 .

3.4

Tricubic Interpolation Method

In order to start the fast marching method we require signed distance values near the interface. This is accomplished by forming a tricubic interpolant of the level set data and then using an iterative solver to find the 0-level set of the interpolant. This method was originated by Chopp [5] and extended to quadrilateral meshes by Yu et al. [11]. We generalize their work to hexahedral meshes as follows. For each hexahedron intersecting the 0-level set with vertices, (ii, jj, kk) with ii = i, i + 1, jj = j, j + 1, and kk = k, k + 1, we construct the following: f (x, y, z) =

3 X 3 X 3 X

am,n xm y n z p

m=0 n=0 p=0

which satisfy fii,jj,kk = φii,jj,kk 8

(∇f )ii,jj,kk = (∇φ)ii,jj,kk ¶ µ 2 ¶ ∂2f ∂ φ = , ∂x∂y ii,jj,kk ∂x∂y ii,jj,kk µ 2 ¶ µ 2 ¶ ∂ f ∂ φ = , ∂x∂z ii,jj,kk ∂x∂z ii,jj,kk µ 2 ¶ µ 2 ¶ ∂ φ ∂ f = , ∂y∂z ii,jj,kk ∂y∂z ii,jj,kk µ ¶ µ ¶ ∂ 3f ∂3f = ∂x∂y∂z ii,jj,kk ∂x∂y∂z ii,jj,kk µ

The derivatives of φ are computed through finite differences. This results in a 64 dimensional linear system which must be inverted to obtain the coefficients of the tricubic interpolant in each associated hexahedron. For each vertex, x0 , of the hexahedra described above, we compute a signed distance value by solving a level set equation along with an orthogonality constraint: f (x) = 0 ∇f (x) × (x0 − x) = 0 Chopp originally proposed to solve the previous equations by alternating Newton iterations on gradient search directions for each equation [5]. Instead, we solve both equations simultaneously via Levenberg-Marquardt iterations to increase the convergence behavior on stretched grid elements: (J T J + λdiag(J T J))pk = −J T F, where

· F (x) = · J(x) =

f (x) ∇f (x) × (x0 − x)

¸ ,

∇f (x) H(f )(x) × (x0 − x) − ∇f (x) × I

¸

which is solved using QR on the associated least squares equations. Here H is the Hessian matrix and λ is a dampening parameter. The distance is taken to be |x − x0 | and the sign is taken to be the same as the previous level set value at the corresponding grid point.

9

3.5

Fast Marching Method

Once the signed distance values near the interface are computed we calculate the remaining values by solving for the viscosity solution of the Eikonal equation via the fast marching method which was originated by Sethian [9]. In order to maintain causality in the fast marching method we cannot have any obtuse angles in the mesh [10]. However, it is not possible to generate a non-obtuse mesh out of a mult-stage mesh. Instead, we use a generalized tetrahedralization of the mesh, i.e. a directed hypergraph with hyperedges of size at most 4. In other words, for every grid point, we are given sets of at most 3 other points which may be used in the associated stencil for that grid point. The only restrictions on the graph are that the nodes for each hyperedge must be affinely independent and one must be able to reach every node from any angle by traveling through the associated tetrahedra for the hyperedges. This allows one to maintain causality for any reasonable mesh and allows one to reduce the number of tetrahedra in the absence of obtuse angles. The algorithm can now essentially be viewed as a version of Djikstra’s algoritm for directed hypergraphs with a quadratic update formula. We run the algorithm for the positive and negative values separately. The algorithm is as follows: 1)Use the tricubic interpolation scheme to find the distance values close to the 0-level set, set those points to be known, and give all other points a value of infinity 2)Calculate values for all neighboring points using the update formula 3)Find tentative point with the smallest distance value, set it to be known, and evaluate all neighboring unknown points 4)Repeat 2-3 until the distance function for all desired points is calculated To update φ(x), given φ(xr ),1 ≤ r ≤ 3 solve: Pr =

x − xr φ(x) − φ(xr ) , νr = 2 − Pr · ∇φ(xr ) ||x − xr || ||x − xr || ν T (P P T )−1 ν = 1

and only use this value if it is smaller than the previous value. To allow quick retrieval of the smallest tentative point we use a min-heap data structure with back pointers to the grid.

10

3.6

Time Step Constraint

The time step constraint needed for stability is determined by the CFL condition for the convection, surface tension, viscosity, and acceleration s à ! r dx dy dz ρ1 + ρ2 3/2 2Re ρ 2 2h ∆t < min , , , We h , h, , u v w 8π 7 µ |F | where h = min (∆x, ∆y, ∆z) and F is the sum of the pressure, viscosity, and surface tension terms in the momentum equation. Note that even though we use a semi-implicit discretization for the diffusion, we are stuck with a quadratic term in the constraint, as the viscosity values are still updated explicitly via the level set evolution.

4

Parallelization

We parallelize our code for distributed memory using mpi. The computational mesh, along with its associated data, is distributed over several processors, currently, with a Cartesian processor layout. The majority of the algorithm is straightforward to parallelize and simply involves each processor sending boundary data to neighboring processors once every time step. The nontrivial components to parallelize are the inversion of the diffusion term, the fast marching method, and the projection. However, since only a few iterations are required for Gauss-Seidel to converge for inversion of the diffusion term, this step does not have a significant impact on runtime.

4.1

Fast Marching Method

In order to parallelize the fast marching method we first restrict ourselves to only solving the Eikonal equation within a small narrowband around the 0-level set, e.g. with thickness on the order of the size of the grid partitions. The algorithm is then as follows: 1)Solve using local information within a narrowband of the 0-level set 2)Communicate boundary values to neighboring processors 3)Recompute local values with the communicated data taken into account 4)Only keep new values if smaller than previous values 5)Repeat until no new smaller values are generated This method minimizes the number of synchronizations required at the cost of temporarily disrupting causality, resulting in bad distance values. However, these values are corrected once the necessary information is available. The resulting solution will, in fact, be the exact same as if a serial fast marching method were performed. The theoretical parallel speedup ). The number of iterations required depends on how can easily be seen to be O( #processors #iterations long it takes distance information to travel everywhere it is needed, which is the largest number of times any given characteristic for the nonlinear PDE crosses a grid partition boundary 11

within the narrowband. Since the characteristics for the Eikonal equation are all straight lines and our narrowband is 1 grid partition thick, the number of iterations is at most 4 in 3D and 3 in 2D. The figure below demonstrates a 2D example where the 0-level set is a point.

Figure 4: Iterations of the parallel fast marching method

4.2

Projection

The multigrid solver for the projection requires synchronization after every color update in the 8-color Gauss-Seidel relaxation, interpolation, and smoothing operation. Currently, we stop coarsening once one of the grid partitions has no more than 2 or 3 grid points remaining in one direction. Here, we use the distributed version of SuperLU for the bottom solver [8]. As in the serial case, we perform the LU decomposition for the coarsened matrix once at the start of the multigrid call and use triangular solves each time the bottom level is reached. In order to minimize the communication overhead we use nonblocking communication routines. We tried using an extended ghost region to reduce the number of synchronization points in the Gauss-Seidel iterations. However, due to the large stencil sizes involved, the extra computation time required for the ghost region was not offset by the reduced communication times. Thus, we only use a ghost region of thickness one. We tried several different cycle layouts. We were able to achieve faster speeds, a 10-50% reduction in runtime, by using full multigrid instead of V-cycles. Even though full multigrid requires much more communication, it spends a much smaller percentage of its time at the finest level. Again we see that it is more advantageous to reduce the number of grid points we need to calculate for as opposed to reducing communication overhead. Additionally, we performed a search to find the optimal multigrid parameters, i.e. number of cycles and Gauss-Seidel iterations per level (See figure below). In particular, the optimal parameter choice uses 1 cycle per level and several Gauss-Seidel iterations.

12

Figure 5: Runtimes for different multigrid parameters

5

Results

All of our results were obtained on the Franklin Cray XT4 supercomputer. We used a set of three different problem sizes: 32X(8,16,32)X32, 64X(16,32,64)X64, and 128X(32,64,128)X128, where the numbers in parentheses are the vertical grid sizes for each stage. Each grid is taken to be periodic in the z-direction. The runtimes calculated here are for the projection step. Since the projection is the bottleneck of the algorithm and we have no major serial component, we can use these results to infer the speedup for the entire algorithm.

5.1

Strong Scaling

On the smallest problem we were only able to fit 22 processors while obtaining a balanced partition on a Cartesian processor layout. In this case we obtained a speedup of about 14. We were unable to run the serial code for the largest problem size due to memory limitations. For the middle problem size we were able to use several different processor layouts and obtained the following strong scaling, with a maximum speedup of about 120 with 176 processors.

13

Figure 6: Speedup for a fixed problem size, 720896 gridpoints There are several possible reasons for the dropoff in speedup as the number of processors increases. For one, the communication overhead increases relative to the number of computations per processor. Additionally, we are not currently taking the network topology into account. Therefore, two processors which neighbor each other in the algorithm might actually be physically far apart in the network. This discrepancy can worsen as we increase the number of processors. Furthermore, we are also limited by the sublinear scaling of SuperLU for the bottom level.

5.2

Weak Scaling

We have the following runtime breakdown for the weak scaling:

Figure 7: Multigrid runtime breakdown Note that the times for each level, other than the bottom, remain roughly constant as we increase the problem size along with the processor count. However, the bottom solver becomes the bottleneck for the larger problem sizes. This is simply due to the fact that the triangular solver has a quadratic complexity and that we are using a fixed number of multigrid levels. Using more multigrid levels would require us to allow processors to go idle while they hand off their work to neighboring processors, which is a feature that we are currently working on. We may use the above results to extrapolate the expected runtimes when the idle processor feature is included. For example, in the case of the largest problem size, the runtime 14

for level 3 would reduce to the level 2 runtime for the middle sized problem, and the level 4 and bottom runtime would reduce to the level 2 and bottom runtime for the smallest problem size. Using this, we can project what the weak scaling in the resolution for the entire algorithm will be:

Figure 8: Projected weak scaling in the resolution for the entire algorithm (with more multigrid levels) Note that these times are the times for completion for the entire algorithm. In particular, the weak scaling is limited by the time step restriction, of the form dt ≤ Cdx2 . As can be seen the projected times are close to this lower bound, within an order of magnitude for the desired resolutions that we would wish to use.

6

Future Work

There is still much work to be done on this project. For one, as described above, we need to allow for idle processors in the multigrid solver to fit in more levels. We can also try using different processor layouts. Currently, the partitions are stretched out depending on what stage they are located in, so that we may maintain the Cartesian layout. We can, instead, use partitions which are logically cubes if we increase the number of vertical partitions for the stages with more grid points in the vertical direction. While this layout will require more communication, it will decrease the surface area to volume ratio of each partition, and, thus, also the communication to computation ratio. In order to ensure that processors corresponding to neighboring partitions are close to each other in the network, we can try taking advantage of mpi’s topology routines. In particular, since we use a periodic boundary in the z-direction, it would be beneficial to take advantage of Franklin’s 3D torus topology. As was noted several times above, the large stencil sizes had a significant effect on run15

time and forced us to use several colors in the Gauss-Seidel iterations, resulting in increased communication. Therefore, we would like to reduce these to 7 point stencils and instead use a red-black checkerboard ordering. One idea for doing this is to use a mesh which is locally orthogonal, i.e. a conformal mapping of Cartesian mesh, in which case many of the corner points in the stencils disappear, as if we just had a Cartesian mesh. In general, the grid elements for such meshes may become malformed but this does not seem to be the case for the domain used in the roller coating simulations, shown previously.

References [1] R. E. Alcouffe, Achi Brandt, Jr. J. E. Dendy, and J. W. Painter, The multi-grid method for the diffusion equation with strongly discontinuous coefficients, SIAM J. Sci. and Stat. Comput. 2 (1981), no. 4, 430–454. [2] Ann S. Almgren, John B. Bell, and William G. Szymczak, A numerical method for the incompressible navier-stokes equations based on an approximate projection, SIAM J. Sci. Comp. 17 (1996), no. 2, 358–369. [3] J. Bell, J. Solomon, and W. Szymczak, A projection method for the viscous incompressible flow on quadrilateral grids, AIAA J. 32 (1994), no. 10, 1961–1969. [4] John B. Bell, Phillip Colella, and Harland M. Glaz, A second-order projection method for the incompressible navier-stokes equations, Journal of Computational Physics 85 (1989), no. 2, 257–283. [5] D. Chopp, Some improvements of the fast marching method, SIAM J. Sci. Comp. 23 (2001), no. 1, 230–244. [6] P. Colella, A direct eulerian muscl scheme for gas dynamics, SIAM J. Sci. Stat. Comput. 6 (1985), 104–117. [7] D. J. Coyle, C. W. Macosko, and L. E. Scriven, Stability of symmetric film-splitting between counter-rotating cylinders, Journal of Fluid Mechanics 216 (1990), pp 437–458. [8] Xiaoye S. Li and James W. Demmel, SuperLU DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems, ACM Trans. Mathematical Software 29 (2003), no. 2, 110–140. [9] J. Sethian, A fast marching level set method for monotonically advancing fronts, Proc. Nat. Acad. Sci. 93 (1996), no. 4, 1591–1595. [10] J. Sethian, Fast methods for the eikonal and related hamilton jacobi equations on unstructured meshes, Proc. Nat. Acad. Sci. 97 (2000), 5699–5703. [11] J. Yu, S. Sakai, and J. Sethian, A coupled level set projection method applied to ink jet simulation, Interfaces and Free Boundaries 5 (2003), 459–482.

16