# Modelling

## Introduction

In a review paper on full-waveform inversion (J. Virieux and Operto, 2009) remark that “efficient numerical modelling of the full seismic wavefield is a central issue in FWI, especially for 3D problems.” Seismic waves are modelled by a partial differential wave equation (PDE) where the input is medium parameters and a source signature, and the solution is a wavefield.

Two wave equation solves are needed to implement the Jacobian of the forwar modelling operator, as defined in full-waveform inversion, mentioned by (Leeuwen, 2012). Depending on the particular algorithm for minimizing the FWI misfit function, further PDE solves may be required to compute its second derivative (the Hessian). Thus, each iteration of FWI must solve a wave equation at least twice; this forms most of the computational burden of the method. (See however a novel method proposed by (Leeuwen and Herrmann, 2013) that avoids one of the wave equation solves.)

In what follows, we first state the mathematical formulation of the seismic wave simulation problem in the time and frequency (time harmonic) domains. We then discuss different discretization methods as well as the algorithms we employ to solve the wave equation. Finally we give brief summaries of the current research being undertaken in this area at SLIM.

## The wave equation

We initially restrict our attention to the case of acoustic (pressure) waves travelling through a constant density, isotropic medium with variable velocity $$v(x)$$. This can be formulated as an initial value problem with time as the independent variable: $$$\label{wave_equation_time_domain.eq} \nabla^2 \wavefield - \frac{1}{\velocity^2} \frac{\partial^2 \wavefield}{\partial t^2} = \source,$$$ where $$\wavefield$$ is the pressure wavefield, $$\source$$ is the seismic source, and $$\nabla^2$$ is the Laplacian operator, which can also be defined to implemented boundary conditions, such as a perfectly matched layer (PML) to reduce artefacts resulting from numerical reflections from the boundaries of the domain.

Taking the Fourier transform of the wavefield $$\wavefield$$ with respect to time, we can re-write the wave equation in the frequency domain, using the frequency $$\freq$$ as the independent variable: $$$\label{wave_equation_freq_domain.eqn} \left ( \nabla^2 + \frac{\freq^2}{\velocity^2} \right ) \wavefield = \source,$$$ where $$\source$$ is now the monochromatic source at frequency $$\freq$$. Equation $$\ref{wave_equation_freq_domain.eqn}$$ is known as the Helmholtz equation.

## Discretization

When $$\velocity$$, $$\wavefield$$ and $$\source$$ represent quantities that have been discretized on a three-dimensional Cartesian grid with $$\numgridpoints$$ grid-points, Equation $$\ref{wave_equation_freq_domain.eqn}$$ can be represented as a large system of linear equations and succinctly written in matrix notation, $$$\label{Helmholtz_system.eqn} \helm (\velocity, \freq) \wavefield = \source,$$$ where $$\helm$$ is the $$\numgridpoints \times \numgridpoints$$ Helmholtz matrix. The elements of $$\helm$$ can be calculated by the method of finite differences, following the strategy of (S. Operto et al., 2007), which consists of two main parts, which we briefly describe here for concreteness.

First, the Laplacian operator is discretized using a $$3 \times 3 \times 3$$ cube stencil. This 27-point stencil is a weighted average of eight different second-order in space 7-point star stencils, each on their own coordinate grid. The coordinate grids are rotated and scaled versions of the original Cartesian grid. The grids are designed in such a way that even though their axes are not parallel to each other, the grid-point locations of all eight grids coincide. This allows the cube stencil to use all 27 points in the three-dimensional neighbourhood of a given central grid point. The weighting coefficients are tuned to minimize numerical anisotropy; for details see (S. Operto et al., 2007).

Second, the value of the velocity model $$\velocity$$ at each grid-point is re-distributed to the 27 neighbouring points that make up the cube stencil for that point, a process known as mass-averaging. Mass-averaging is done using a second set of weighting coefficients, and results in a matrix with the same pattern of non-zero entries as the discretized Laplacian. By choosing optimal values for the mass-averaging coefficients, (S. Operto et al., 2007) showed that numerical dispersion of the stencil is minimized, which enhances (by a constant factor) the stencil’s accuracy. This allows to use as little as 4 grid-points per wavelength, although 6 grid points per wavelength are used in this work. The need for accuracy by using finer grid spacings (more grid-points per wavelength) must be balanced against the computational expense of simulating on a larger grid. A further advantage of the stencil introduced by (S. Operto et al., 2007) is that it is compact in spatial extent.

The mass matrix and the discretized Laplacian are added together to make the Helmholtz matrix $$\helm$$ which is very sparse: while $$\numgridpoints$$ is at least $$10^7$$ for a realistic model, the number of non-zeros per row, determined by the finite difference stencil, is at most only 27. $$\helm$$ is also very structured: its non-zeros are arranged in 27 diagonals.

We note that other stencil geometries are possible, which would result in different structure of the Helmholtz matrix $$\helm$$, and discretization strategies like the method of finite elements are alternatives to the method finite differences.

## Solving the wave equation

Since the matrix $$\helm$$ is very large (target size $$\approx 10^{10}$$ non-zeros) direct solvers are not well suited to solve the Helmholtz system because they destroy the sparse structure of the matrix by creating infill, and use too much memory. Instead, an iterative Krylov method (such as the method of conjugate gradients, CG) is more appealing, as such solvers do not even need to store $$\helm$$, but need only to know how to apply the matrix to a vector. (See (Saad, 2003) for a comprehensive reference on Krylov solvers.) However, since CG needs $$\helm$$ to be symmetric positive definite (SPD) to converge, it cannot be used directly but needs a preconditioner.

One such preconditioner is the Kaczmarz algorithm (Kaczmarz, 1937), which consists of successively projecting an iterate onto the hyperplanes associated with the rows of the matrix. The Kaczmarz algorithm is conventionally applied in a double sweep, with rows being selected in order from first to last and back again. The action of a double Kaczmarz sweep ($$\DKSWP$$) can be represented in matrix notation: \label{Kaczmarz_in_matrix_notation.eqn} \begin{aligned} \DKSWP(\helm,\wavefield,\source) & = Q_1 \cdots Q_{\numgridpoints} Q_{\numgridpoints} \cdots Q_1 \wavefield + R \source \\ & = Q \wavefield + R \source, \end{aligned} where each of the $$Q_i$$ matrices are the projections onto the $$i^{\text{th}}$$ row of the Helmholtz matrix $$\helm$$. The advantage of the Kaczmarz preconditioner is that it is idependent of the matrix and does not require extra storage, as mentioned by (Leeuwen et al., 2012). If $$\wavefield$$ is assumed to the be the solution of the Helmholtz system (Equation $$\ref{Helmholtz_system.eqn}$$), and hence a fixed point of the Kaczmarz recursion relation $$\ref{Kaczmarz_in_matrix_notation.eqn}$$, that equation can be rearranged as $$$\label{CGMN_system.eqn} (I-Q) \wavefield = R \source.$$$ This equivalent system will have the same solution as Equation $$\ref{Helmholtz_system.eqn}$$, and can be solved with CG. Using CG to solve this fixed-point system resulting from an application of the Kaczmarz double sweep is known as the CGMN algorithm, and is due to (Björck and Elfving, 1979). Note that the matrices $$Q$$ and $$R$$ do not have to be formed explicitly, as their action on a vector is calculated using a double Kaczmarz sweep.

The Kaczmarz sweep can be parallelized by performing the sweep on blocks of rows at the same time, and averging those elements of the iterate that are updated in more than one block. This algorithm is known as CARP and has been introduced by (D. Gordon and Gordon, 2005). Parallelizing CGMN by replacing the double Kaczmarz sweep with several double CARP sweeps has been suggested by (Gordon and Gordon, 2010) as the CARP-CG algorithm. SLIM commonly uses CARP-CG to solve the Helmholtz system.

Figures 2 and 3 show an example velocity model and resulting wavefield, obtained by solving Equation $$\ref{Helmholtz_system.eqn}$$ with CGMN.

## SLIM software implementations

Readers may be interested in an abstraction of the wave equation solution procedure described above into a framework well-suited for use as part of an optimization procedure. Another description of that modelling framework can be found here. An application of the framework to FWI can be found on this page.

Finally, an overview of all of SLIM’s software is provided, as well as a graphical representation of the concepts relevant to the work being at SLIM.

## Current research

There are several lines of research being pursued at SLIM in the field of numerical modelling of seismic waves in the context of full-waveform inversion.

### Accelerating the CGMN Helmholtz solver using reconfigurable hardware

The use of many-core and GPU (graphic processing unit) based accelerators is growing in high-performance computing, as evidenced by the ‘’top 500’’ list of the world’s most capable machines, compiled by (Meuer et al., 2013). As of 2013, four of the top ten machines on that list have a CPU host + accelerator architecture. The use of reconfigurable hardware accelerators such as FPGAs (field programmable gate arrays) has been more reluctant, due in part to the different programming paradigms necessary to take advantage of such units. However in the work of (Petrenko et al., 2014), we show that implementing the computationally intensive kernel of wave simulation code (namely, the Kaczmarz algorithm) on an FPGA-based accelerator gives a speed-up of more than a factor of two compared to an implementation that uses one Intel Xeon E5-2670 core exclusively.

FPGAs are computer chips that consist of electronic building blocks that the user can configure and reconfigure to represent their algorithm in hardware. Whereas a traditional processor can be viewed as a pipeline for processing instructions, an FPGA is a pipeline for processing data. The chief advantage of the FPGA is that all the instructions in the algorithm are already hardwired onto the chip. This means that execution time depends only on the amount of data to be processed, and not on the complexity of the algorithm.

## References

Aminzadeh, F., Jean, B., and Kunz, T., 1997, 3-D salt and overthrust models: Society of Exploration Geophysicists.

Björck, Å., and Elfving, T., 1979, Accelerated projection methods for computing pseudoinverse solutions of systems of linear equations: BIT Numerical Mathematics, 19, 145–163. doi:10.1007/BF01930845

Gordon, D., and Gordon, R., 2005, Component-averaged row projections: A robust, block-parallel scheme for sparse linear systems: SIAM Journal on Scientific Computing, 27, 1092–1117. doi:10.1137/040609458

Gordon, D., and Gordon, R., 2010, CARP-CG: A robust and efficient parallel solver for linear systems, applied to strongly convection dominated PDEs: Parallel Computing, 36, 495–515. doi:10.1016/j.parco.2010.05.004

Kaczmarz, S., 1937, Angenäherte auflösung von systemen linearer gleichungen: Bulletin International de l’Academie Polonaise Des Sciences Et Des Lettres, 35, 355–357.

Leeuwen, T. van, 2012, A parallel matrix-free framework for frequency-domain seismic modelling, imaging and inversion in matlab: Retrieved from https://www.slim.eos.ubc.ca/Publications/Public/TechReport/2012/vanleeuwen2012smii/vanleeuwen2012smii.pdf

Leeuwen, T. van, and Herrmann, F. J., 2013, Mitigating local minima in full-waveform inversion by expanding the search space: Geophysical Journal International, 195, 661–667. doi:10.1093/gji/ggt258

Leeuwen, T. van, Gordon, D., Gordon, R., and Herrmann, F. J., 2012, Preconditioning the Helmholtz equation via row-projections: EAGE. Retrieved from https://www.slim.eos.ubc.ca/Publications/Public/Conferences/EAGE/2012/vanleeuwen2012EAGEcarpcg/vanleeuwen2012EAGEcarpcg.pdf

Meuer, H., Strohmaier, E., Dongarra, J., and Simon, H., 2013, Top 500 supercomputer sites: Retrieved from https://www.top500.org

Operto, S., Virieux, J., Amestoy, P., L’Excellent, J.-Y., Giraud, L., and Ali, H. B. H., 2007, 3D finite-difference frequency-domain modeling of visco-acoustic wave propagation using a massively parallel direct solver: A feasibility study: Geophysics, 72, SM195–SM211. doi:10.1190/1.2759835

Petrenko, A., Leeuwen, T. van, Oriato, D., Tilbury, S., and Herrmann, F. J., 2014, Accelerating an iterative Helmholtz solver with FPGAs: EAGE. doi:10.3997/2214-4609.20141141

Saad, Y., 2003, Iterative methods for sparse linear systems: Society for Industrial; Applied Mathematics.

Virieux, J., and Operto, S., 2009, An overview of full-waveform inversion in exploration geophysics: Geophysics, 74, WCC1–WCC26. doi:10.1190/1.3238367