2.4.6. Preconditioner modules

The efficiency of Krylov iterative methods for the solution of linear systems can be greatly enhanced through preconditioning. For problems in which the user cannot define a more effective, problem-specific preconditioner, ARKODE provides two internal preconditioner modules: a banded preconditioner for serial and threaded problems (ARKBANDPRE) and a band-block-diagonal preconditioner for parallel problems (ARKBBDPRE).

2.4.6.1. A serial banded preconditioner module

This preconditioner provides a band matrix preconditioner for use with iterative SUNLINSOL modules in a serial or threaded setting. It requires that the problem be set up using either the NVECTOR_SERIAL, NVECTOR_OPENMP or NVECTOR_PTHREADS module, due to data access patterns. It also currently requires that the problem involve an identity mass matrix, i.e., \(M = I\).

This module uses difference quotients of the ODE right-hand side function \(f^I\) to generate a band matrix of bandwidth ml + mu + 1, where the number of super-diagonals (mu, the upper half-bandwidth) and sub-diagonals (ml, the lower half-bandwidth) are specified by the user. This band matrix is used to to form a preconditioner the Krylov linear solver. Although this matrix is intended to approximate the Jacobian \(J = \dfrac{\partial f^I}{\partial y}\), it may be a very crude approximation, since the true Jacobian may not be banded, or its true bandwidth may be larger than ml + mu + 1. However, as long as the banded approximation generated for the preconditioner is sufficiently accurate, it may speed convergence of the Krylov iteration.

2.4.6.1.1. ARKBANDPRE usage

In order to use the ARKBANDPRE module, the user need not define any additional functions. In addition to the header files required for the integration of the ODE problem (see §2.4.1), to use the ARKBANDPRE module, the user’s program must include the header file arkode_bandpre.h which declares the needed function prototypes. The following is a summary of the usage of this module. Steps that are unchanged from the skeleton program presented in §2.4.2 are italicized.

  1. Initialize multi-threaded environment (if appropriate)

  2. Create the SUNDIALS simulation context object.

  3. Set problem dimensions

  4. Set vector of initial values

  5. Create ARKODE object

  6. Specify integration tolerances

  7. Create iterative linear solver object

    When creating the iterative linear solver object, specify the type of preconditioning (SUN_PREC_LEFT or SUN_PREC_RIGHT) to use.

  8. Set linear solver optional inputs

  9. Attach linear solver module

  10. Initialize the ARKBANDPRE preconditioner module

    Specify the upper and lower half-bandwidths (mu and ml, respectively) and call

    ier = ARKBandPrecInit(arkode_mem, N, mu, ml);

    to allocate memory and initialize the internal preconditioner data.

  11. Create nonlinear solver object

  12. Attach nonlinear solver module

  13. Set nonlinear solver optional inputs

  14. Set optional inputs

    Note that the user should not call ARKodeSetPreconditioner() as it will overwrite the preconditioner setup and solve functions.

  15. Specify rootfinding problem

  16. Advance solution in time

  17. Get optional outputs

    Additional optional outputs associated with ARKBANDPRE are available by way of the two routines described below, ARKBandPrecGetWorkSpace() and ARKBandPrecGetNumRhsEvals().

  18. Deallocate memory for solution vector

  19. Free solver memory

  20. Free linear solver memory

  21. Free nonlinear solver memory

2.4.6.1.2. ARKBANDPRE user-callable functions

The ARKBANDPRE preconditioner module is initialized and attached by calling the following function:

int ARKBandPrecInit(void *arkode_mem, sunindextype N, sunindextype mu, sunindextype ml)

Initializes the ARKBANDPRE preconditioner and allocates required (internal) memory for it.

Parameters:
  • arkode_mem – pointer to the ARKODE memory block.

  • N – problem dimension (size of ODE system).

  • mu – upper half-bandwidth of the Jacobian approximation.

  • ml – lower half-bandwidth of the Jacobian approximation.

Return values:
  • ARKLS_SUCCESS – the function exited successfully.

  • ARKLS_MEM_NULLarkode_mem was NULL.

  • ARKLS_LMEM_NULL – the linear solver memory was NULL.

  • ARKLS_ILL_INPUT – an input had an illegal value.

  • ARKLS_MEM_FAIL – a memory allocation request failed.

Note

The banded approximate Jacobian will have nonzero elements only in locations \((i,j)\) with ml \(\le j-i \le\) mu.

The following two optional output functions are available for use with the ARKBANDPRE module:

int ARKBandPrecGetWorkSpace(void *arkode_mem, long int *lenrwLS, long int *leniwLS)

Returns the sizes of the ARKBANDPRE real and integer workspaces.

Parameters:
  • arkode_mem – pointer to the ARKODE memory block.

  • lenrwLS – the number of sunrealtype values in the ARKBANDPRE workspace.

  • leniwLS – the number of integer values in the ARKBANDPRE workspace.

Return values:
  • ARKLS_SUCCESS – the function exited successfully.

  • ARKLS_MEM_NULLarkode_mem was NULL.

  • ARKLS_LMEM_NULL – the linear solver memory was NULL.

  • ARKLS_PMEM_NULL – the preconditioner memory was NULL.

Note

The workspace requirements reported by this routine correspond only to memory allocated within the ARKBANDPRE module (the banded matrix approximation, banded SUNLinearSolver object, and temporary vectors).

The workspaces referred to here exist in addition to those given by the corresponding function ARKodeGetLinWorkSpace().

int ARKBandPrecGetNumRhsEvals(void *arkode_mem, long int *nfevalsBP)

Returns the number of calls made to the user-supplied right-hand side function \(f^I\) for constructing the finite-difference banded Jacobian approximation used within the preconditioner setup function.

Parameters:
  • arkode_mem – pointer to the ARKODE memory block.

  • nfevalsBP – number of calls to \(f^I\).

Return values:
  • ARKLS_SUCCESS – the function exited successfully.

  • ARKLS_MEM_NULLarkode_mem was NULL.

  • ARKLS_LMEM_NULL – the linear solver memory was NULL.

  • ARKLS_PMEM_NULL – the preconditioner memory was NULL.

Note

The counter nfevalsBP is distinct from the counter nfevalsLS returned by the corresponding function ARKodeGetNumLinRhsEvals() and also from the number of evaluations returned by the time-stepping module (e.g., nfi_evals returned by ARKStepGetNumRhsEvals()). The total number of right-hand side function evaluations is the sum of all three of these counters.

2.4.6.2. A parallel band-block-diagonal preconditioner module

A principal reason for using a parallel ODE solver (such as ARKODE) lies in the solution of partial differential equations (PDEs). Moreover, Krylov iterative methods are used on many such problems due to the nature of the underlying linear system of equations that needs to solved at each time step. For many PDEs, the linear algebraic system is large, sparse and structured. However, if a Krylov iterative method is to be effective in this setting, then a nontrivial preconditioner is required. Otherwise, the rate of convergence of the Krylov iterative method is usually slow, and degrades as the PDE mesh is refined. Typically, an effective preconditioner must be problem-specific.

However, we have developed one type of preconditioner that treats a rather broad class of PDE-based problems. It has been successfully used with CVODE for several realistic, large-scale problems [71], and is included in a software module within the ARKODE package. This preconditioning module works with the parallel vector module NVECTOR_PARALLEL and is usable with any of the Krylov iterative linear solvers through the ARKLS interface. It generates a preconditioner that is a block-diagonal matrix with each block being a band matrix. The blocks need not have the same number of super- and sub-diagonals and these numbers may vary from block to block. This Band-Block-Diagonal Preconditioner module is called ARKBBDPRE.

One way to envision these preconditioners is to think of the computational PDE domain as being subdivided into \(Q\) non-overlapping subdomains, where each subdomain is assigned to one of the \(Q\) MPI tasks used to solve the ODE system. The basic idea is to isolate the preconditioning so that it is local to each process, and also to use a (possibly cheaper) approximate right-hand side function for construction of this preconditioning matrix. This requires the definition of a new function \(g(t,y) \approx f^I(t,y)\) that will be used to construct the BBD preconditioner matrix. At present, we assume that the ODE be written in explicit form as

\[\dot{y} = f^E(t,y) + f^I(t,y),\]

where \(f^I\) corresponds to the ODE components to be treated implicitly, i.e. this preconditioning module does not support problems with non-identity mass matrices. The user may set \(g = f^I\), if no less expensive approximation is desired.

Corresponding to the domain decomposition, there is a decomposition of the solution vector \(y\) into \(Q\) disjoint blocks \(y_q\), and a decomposition of \(g\) into blocks \(g_q\). The block \(g_q\) depends both on \(y_p\) and on components of blocks \(y_{q'}\) associated with neighboring subdomains (so-called ghost-cell data). If we let \(\bar{y}_q\) denote \(y_q\) augmented with those other components on which \(g_q\) depends, then we have

\[g(t,y) = \left[ g_1(t,\bar{y}_1), g_2(t,\bar{y}_2), \ldots , g_Q(t,\bar{y}_Q) \right]^T,\]

and each of the blocks \(g_q(t,\bar{y}_q)\) is decoupled from one another.

The preconditioner associated with this decomposition has the form

\[\begin{split}P = \begin{bmatrix} P_1 & & & \\ & P_2 & & \\ & & \ddots &\\ & & & P_Q \end{bmatrix}\end{split}\]

where

\[P_q \approx I - \gamma J_q\]

and where \(J_q\) is a difference quotient approximation to \(\dfrac{\partial g_q}{\partial \bar{y}_q}\). This matrix is taken to be banded, with upper and lower half-bandwidths mudq and mldq defined as the number of non-zero diagonals above and below the main diagonal, respectively. The difference quotient approximation is computed using mudq + mldq + 2 evaluations of \(g_m\), but only a matrix of bandwidth mukeep + mlkeep + 1 is retained. Neither pair of parameters need be the true half-bandwidths of the Jacobian of the local block of \(g\), if smaller values provide a more efficient preconditioner. The solution of the complete linear system

\[Px = b\]

reduces to solving each of the distinct equations

\[P_q x_q = b_q, \quad q=1,\ldots,Q,\]

and this is done by banded LU factorization of \(P_q\) followed by a banded backsolve.

Similar block-diagonal preconditioners could be considered with different treatments of the blocks \(P_q\). For example, incomplete LU factorization or an iterative method could be used instead of banded LU factorization.

2.4.6.2.1. ARKBBDPRE user-supplied functions

The ARKBBDPRE module calls two user-provided functions to construct \(P\): a required function gloc (of type ARKLocalFn()) which approximates the right-hand side function \(g(t,y) \approx f^I(t,y)\) and which is computed locally, and an optional function cfn (of type ARKCommFn()) which performs all inter-process communication necessary to evaluate the approximate right-hand side \(g\). These are in addition to the user-supplied right-hand side function \(f^I\). Both functions take as input the same pointer user_data that is passed by the user to ARKodeSetUserData() and that was passed to the user’s function \(f^I\). The user is responsible for providing space (presumably within user_data) for components of \(y\) that are communicated between processes by cfn, and that are then used by gloc, which should not do any communication.

typedef int (*ARKLocalFn)(sunindextype Nlocal, sunrealtype t, N_Vector y, N_Vector glocal, void *user_data)

This gloc function computes \(g(t,y)\). It fills the vector glocal as a function of t and y.

Param Nlocal:

the local vector length.

Param t:

the value of the independent variable.

Param y:

the value of the dependent variable vector on this process.

Param glocal:

the output vector of \(g(t,y)\) on this process.

Param user_data:

a pointer to user data, the same as the user_data parameter passed to ARKodeSetUserData().

Return:

An ARKLocalFn should return 0 if successful, a positive value if a recoverable error occurred (in which case ARKODE will attempt to correct), or a negative value if it failed unrecoverably (in which case the integration is halted and ARKodeEvolve() will return ARK_LSETUP_FAIL).

Note

This function should assume that all inter-process communication of data needed to calculate glocal has already been done, and that this data is accessible within user data.

The case where \(g\) is mathematically identical to \(f^I\) is allowed.

typedef int (*ARKCommFn)(sunindextype Nlocal, sunrealtype t, N_Vector y, void *user_data)

This cfn function performs all inter-process communication necessary for the execution of the gloc function above, using the input vector y.

Param Nlocal:

the local vector length.

Param t:

the value of the independent variable.

Param y:

the value of the dependent variable vector on this process.

Param user_data:

a pointer to user data, the same as the user_data parameter passed to ARKodeSetUserData().

Return:

An ARKCommFn should return 0 if successful, a positive value if a recoverable error occurred (in which case ARKODE will attempt to correct), or a negative value if it failed unrecoverably (in which case the integration is halted and ARKodeEvolve() will return ARK_LSETUP_FAIL).

Note

The cfn function is expected to save communicated data in space defined within the data structure user_data.

Each call to the cfn function is preceded by a call to the right-hand side function \(f^I\) with the same \((t,y)\) arguments. Thus, cfn can omit any communication done by \(f^I\) if relevant to the evaluation of glocal. If all necessary communication was done in \(f^I\), then cfn = NULL can be passed in the call to ARKBBDPrecInit() (see below).

2.4.6.2.2. ARKBBDPRE usage

In addition to the header files required for the integration of the ODE problem (see §2.4.1), to use the ARKBBDPRE module, the user’s program must include the header file arkode_bbdpre.h which declares the needed function prototypes.

The following is a summary of the proper usage of this module. Steps that are unchanged from the skeleton program presented in §2.4.2 are italicized.

  1. Initialize MPI

  2. Create the SUNDIALS simulation context object

  3. Set problem dimensions

  4. Set vector of initial values

  5. Create ARKODE object

  6. Specify integration tolerances

  7. Create iterative linear solver object

    When creating the iterative linear solver object, specify the type of preconditioning (SUN_PREC_LEFT or SUN_PREC_RIGHT) to use.

  8. Set linear solver optional inputs

  9. Attach linear solver module

  10. Initialize the ARKBBDPRE preconditioner module

    Specify the upper and lower half-bandwidths for computation mudq and mldq, the upper and lower half-bandwidths for storage mukeep and mlkeep, and call

    ier = ARKBBDPrecInit(arkode_mem, Nlocal, mudq, mldq, mukeep, mlkeep, dqrely, gloc, cfn);

    to allocate memory and initialize the internal preconditioner data. The last two arguments of ARKBBDPrecInit() are the two user-supplied functions of type ARKLocalFn() and ARKCommFn() described above, respectively.

  11. Create nonlinear solver object

  12. Attach nonlinear solver module

  13. Set nonlinear solver optional inputs

  14. Set optional inputs

    Note that the user should not call ARKodeSetPreconditioner() as it will overwrite the preconditioner setup and solve functions.

  15. Specify rootfinding problem

  16. Advance solution in time

  17. Get optional outputs

    Additional optional outputs associated with ARKBBDPRE are available through the routines ARKBBDPrecGetWorkSpace() and ARKBBDPrecGetNumGfnEvals().

  18. Deallocate memory for solution vector

  19. Free solver memory

  20. Free linear solver memory

  21. Free nonlinear solver memory

  22. Finalize MPI

2.4.6.2.3. ARKBBDPRE user-callable functions

The ARKBBDPRE preconditioner module is initialized (or re-initialized) and attached to the integrator by calling the following functions:

int ARKBBDPrecInit(void *arkode_mem, sunindextype Nlocal, sunindextype mudq, sunindextype mldq, sunindextype mukeep, sunindextype mlkeep, sunrealtype dqrely, ARKLocalFn gloc, ARKCommFn cfn)

Initializes and allocates (internal) memory for the ARKBBDPRE preconditioner.

Parameters:
  • arkode_mem – pointer to the ARKODE memory block.

  • Nlocal – local vector length.

  • mudq – upper half-bandwidth to be used in the difference quotient Jacobian approximation.

  • mldq – lower half-bandwidth to be used in the difference quotient Jacobian approximation.

  • mukeep – upper half-bandwidth of the retained banded approximate Jacobian block.

  • mlkeep – lower half-bandwidth of the retained banded approximate Jacobian block.

  • dqrely – the relative increment in components of y used in the difference quotient approximations. The default is dqrely = \(\sqrt{\text{unit roundoff}}\), which can be specified by passing dqrely = 0.0.

  • gloc – the name of the C function (of type ARKLocalFn()) which computes the approximation \(g(t,y) \approx f^I(t,y)\).

  • cfn – the name of the C function (of type ARKCommFn()) which performs all inter-process communication required for the computation of \(g(t,y)\).

Return values:
  • ARKLS_SUCCESS – the function exited successfully.

  • ARKLS_MEM_NULLarkode_mem was NULL.

  • ARKLS_LMEM_NULL – the linear solver memory was NULL.

  • ARKLS_ILL_INPUT – an input had an illegal value.

  • ARKLS_MEM_FAIL – a memory allocation request failed.

Note

If one of the half-bandwidths mudq or mldq to be used in the difference quotient calculation of the approximate Jacobian is negative or exceeds the value Nlocal-1, it is replaced by 0 or Nlocal-1 accordingly.

The half-bandwidths mudq and mldq need not be the true half-bandwidths of the Jacobian of the local block of \(g\) when smaller values may provide a greater efficiency.

Also, the half-bandwidths mukeep and mlkeep of the retained banded approximate Jacobian block may be even smaller than mudq and mldq, to reduce storage and computational costs further.

For all four half-bandwidths, the values need not be the same on every processor.

The ARKBBDPRE module also provides a re-initialization function to allow solving a sequence of problems of the same size, with the same linear solver choice, provided there is no change in Nlocal, mukeep, or mlkeep. After solving one problem, and after calling *StepReInit to re-initialize ARKODE for a subsequent problem, a call to ARKBBDPrecReInit() can be made to change any of the following: the half-bandwidths mudq and mldq used in the difference-quotient Jacobian approximations, the relative increment dqrely, or one of the user-supplied functions gloc and cfn. If there is a change in any of the linear solver inputs, an additional call to the “Set” routines provided by the SUNLINSOL module, and/or one or more of the corresponding ARKodeSet*** functions, must also be made (in the proper order).

int ARKBBDPrecReInit(void *arkode_mem, sunindextype mudq, sunindextype mldq, sunrealtype dqrely)

Re-initializes the ARKBBDPRE preconditioner module.

Parameters:
  • arkode_mem – pointer to the ARKODE memory block.

  • mudq – upper half-bandwidth to be used in the difference quotient Jacobian approximation.

  • mldq – lower half-bandwidth to be used in the difference quotient Jacobian approximation.

  • dqrely – the relative increment in components of y used in the difference quotient approximations. The default is dqrely = \(\sqrt{\text{unit roundoff}}\), which can be specified by passing dqrely = 0.0.

Return values:
  • ARKLS_SUCCESS – the function exited successfully.

  • ARKLS_MEM_NULLarkode_mem was NULL.

  • ARKLS_LMEM_NULL – the linear solver memory was NULL.

  • ARKLS_PMEM_NULL – the preconditioner memory was NULL.

Note

If one of the half-bandwidths mudq or mldq is negative or exceeds the value Nlocal-1, it is replaced by 0 or Nlocal-1 accordingly.

The following two optional output functions are available for use with the ARKBBDPRE module:

int ARKBBDPrecGetWorkSpace(void *arkode_mem, long int *lenrwBBDP, long int *leniwBBDP)

Returns the processor-local ARKBBDPRE real and integer workspace sizes.

Parameters:
  • arkode_mem – pointer to the ARKODE memory block.

  • lenrwBBDP – the number of sunrealtype values in the ARKBBDPRE workspace.

  • leniwBBDP – the number of integer values in the ARKBBDPRE workspace.

Return values:
  • ARKLS_SUCCESS – the function exited successfully.

  • ARKLS_MEM_NULLarkode_mem was NULL.

  • ARKLS_LMEM_NULL – the linear solver memory was NULL.

  • ARKLS_PMEM_NULL – the preconditioner memory was NULL.

Note

The workspace requirements reported by this routine correspond only to memory allocated within the ARKBBDPRE module (the banded matrix approximation, banded SUNLinearSolver object, temporary vectors). These values are local to each process.

The workspaces referred to here exist in addition to those given by the corresponding function ARKodeGetLinWorkSpace().

int ARKBBDPrecGetNumGfnEvals(void *arkode_mem, long int *ngevalsBBDP)

Returns the number of calls made to the user-supplied gloc function (of type ARKLocalFn()) due to the finite difference approximation of the Jacobian blocks used within the preconditioner setup function.

Parameters:
  • arkode_mem – pointer to the ARKODE memory block.

  • ngevalsBBDP – the number of calls made to the user-supplied gloc function.

Return values:
  • ARKLS_SUCCESS – the function exited successfully.

  • ARKLS_MEM_NULLarkode_mem was NULL.

  • ARKLS_LMEM_NULL – the linear solver memory was NULL.

  • ARKLS_PMEM_NULL – the preconditioner memory was NULL.

In addition to the ngevalsBBDP gloc evaluations, the costs associated with ARKBBDPRE also include nlinsetups LU factorizations, nlinsetups calls to cfn, npsolves banded backsolve calls, and nfevalsLS right-hand side function evaluations, where nlinsetups is an optional ARKODE output and npsolves and nfevalsLS are linear solver optional outputs (see the table §2.4.3.10.4).