# 5.2. Mathematical Considerations

IDA solves the initial-value problem (IVP) for a DAE system of the general form

where \(y\), \(\dot{y}\), and \(F\) are vectors in \(\mathbb{R}^N\), \(t\) is the independent variable, \(\dot{y} = \mathrm dy/\mathrm dt\), and initial values \(y_0\), \(\dot{y}_0\) are given. Often \(t\) is time, but it certainly need not be.

## 5.2.1. Initial Condition

Prior to integrating a DAE initial-value problem, an important requirement is that the pair of vectors \(y_0\) and \(\dot{y}_0\) are both initialized to satisfy the DAE residual \(F(t_0,y_0, \dot{y}_0) = 0\). For a class of problems that includes so-called semi-explicit index-one systems, IDA provides a routine that computes consistent initial conditions from a user’s initial guess [24]. For this, the user must identify sub-vectors of \(y\) (not necessarily contiguous), denoted \(y_d\) and \(y_a\), which are its differential and algebraic parts, respectively, such that \(F\) depends on \(\dot{y}_d\) but not on any components of \(\dot{y}_a\). The assumption that the system is “index one” means that for a given \(t\) and \(y_d\), the system \(F(t,y,\dot{y}) = 0\) defines \(y_a\) uniquely. In this case, a solver within IDA computes \(y_a\) and \(\dot{y}_d\) at \(t = t_0\), given \(y_d\) and an initial guess for \(y_a\). A second available option with this solver also computes all of \(y(t_0)\) given \(\dot{y}(t_0)\); this is intended mainly for quasi-steady-state problems, where \(\dot{y}(t_0) = 0\) is given. In both cases, IDA solves the system \(F(t_0,y_0, \dot{y}_0) = 0\) for the unknown components of \(y_0\) and \(\dot{y}_0\), using a Newton iteration augmented with a line search globalization strategy. In doing this, it makes use of the existing machinery that is to be used for solving the linear systems during the integration, in combination with certain tricks involving the step size (which is set artificially for this calculation). For problems that do not fall into either of these categories, the user is responsible for passing consistent values, or risks failure in the numerical integration.

## 5.2.2. IVP solution

The integration method used in IDA is the variable-order, variable-coefficient BDF (Backward Differentiation Formula), in fixed-leading-coefficient form [19]. The method order ranges from 1 to 5, with the BDF of order \(q\) given by the multistep formula

where \(y_n\) and \(\dot{y}_n\) are the computed approximations to \(y(t_n)\) and \(\dot{y}(t_n)\), respectively, and the step size is \(h_n = t_n - t_{n-1}\). The coefficients \(\alpha_{n,i}\) are uniquely determined by the order \(q\), and the history of the step sizes. The application of the BDF (5.2) to the DAE system (5.1) results in a nonlinear algebraic system to be solved at each step:

In the process of controlling errors at various levels, IDA uses a weighted root-mean-square norm, denoted \(\|\cdot\|_{\text{WRMS}}\), for all error-like quantities. The multiplicative weights used are based on the current solution and on the relative and absolute tolerances input by the user, namely

Because \(1/W_i\) represents a tolerance in the component \(y_i\), a vector whose norm is 1 is regarded as “small.” For brevity, we will usually drop the subscript WRMS on norms in what follows.

### 5.2.2.1. Nonlinear Solve

By default IDA solves (5.3) with a Newton iteration but IDA also allows for user-defined nonlinear solvers (see Chapter §11). Each Newton iteration requires the solution of a linear system of the form

where \(y_{n(m)}\) is the \(m\)-th approximation to \(y_n\). Here \(J\) is some approximation to the system Jacobian

where \(\alpha = \alpha_{n,0}/h_n\). The scalar \(\alpha\) changes whenever the step size or method order changes.

For the solution of the linear systems within the Newton iteration, IDA provides
several choices, including the option of a user-supplied linear solver (see
Chapter §10). The linear solvers distributed with SUNDIALS are
organized in two families, a *direct* family comprising direct linear solvers
for dense, banded, or sparse matrices and a *spils* family comprising scaled
preconditioned iterative (Krylov) linear solvers. The methods offered through
these modules are as follows:

dense direct solvers, including an internal implementation, an interface to BLAS/LAPACK, an interface to MAGMA [126] and an interface to the oneMKL library [3],

band direct solvers, including an internal implementation or an interface to BLAS/LAPACK,

sparse direct solver interfaces to various libraries, including KLU [4, 40], SuperLU_MT [9, 42, 89], SuperLU_Dist [8, 58, 90, 91], and cuSPARSE [7],

SPGMR, a scaled preconditioned GMRES (Generalized Minimal Residual method) solver with or without restarts,

SPFGMR, a scaled preconditioned FGMRES (Flexible Generalized Minimal Residual method) solver with or without restarts,

SPBCG, a scaled preconditioned Bi-CGStab (Bi-Conjugate Gradient Stable method) solver,

SPTFQMR, a scaled preconditioned TFQMR (Transpose-Free Quasi-Minimal Residual method) solver, or

PCG, a scaled preconditioned CG (Conjugate Gradient method) solver.

For large stiff systems, where direct methods are not feasible, the combination
of a BDF integrator and a preconditioned Krylov method yields a powerful tool
because it combines established methods for stiff integration, nonlinear
iteration, and Krylov (linear) iteration with a problem-specific treatment of
the dominant source of stiffness, in the form of the user-supplied
preconditioner matrix [22]. For the *spils* linear solvers with
IDA, preconditioning is allowed only on the left (see
§5.2.3). Note that the dense, band, and sparse
direct linear solvers can only be used with serial and threaded vector
representations.

In the case of a matrix-based linear solver, the default Newton iteration is a Modified Newton iteration, in that the Jacobian \(J\) is fixed (and usually out of date) throughout the nonlinear iterations, with a coefficient \(\bar\alpha\) in place of \(\alpha\) in \(J\). However, in the case that a matrix-free iterative linear solver is used, the default Newton iteration is an Inexact Newton iteration, in which \(J\) is applied in a matrix-free manner, with matrix-vector products \(Jv\) obtained by either difference quotients or a user-supplied routine. In this case, the linear residual \(J\Delta y + G\) is nonzero but controlled. With the default Newton iteration, the matrix \(J\) and preconditioner matrix \(P\) are updated as infrequently as possible to balance the high costs of matrix operations against other costs. Specifically, this matrix update occurs when:

starting the problem,

the value \(\bar\alpha\) at the last update is such that \(\alpha / {\bar\alpha} < 3/5\) or \(\alpha / {\bar\alpha} > 5/3\), or

a non-fatal convergence failure occurred with an out-of-date \(J\) or \(P\).

The above strategy balances the high cost of frequent matrix evaluations and preprocessing with the slow convergence due to infrequent updates. To reduce storage costs on an update, Jacobian information is always reevaluated from scratch.

The default stopping test for nonlinear solver iterations in IDA ensures that the iteration error \(y_n - y_{n(m)}\) is small relative to \(y\) itself. For this, we estimate the linear convergence rate at all iterations \(m>1\) as

where the \(\delta_m = y_{n(m)} - y_{n(m-1)}\) is the correction at iteration \(m=1,2,\ldots\). The nonlinear solver iteration is halted if \(R>0.9\). The convergence test at the \(m\)-th iteration is then

where \(S = R/(R-1)\) whenever \(m>1\) and \(R\le 0.9\). The user has the option of changing the constant in the convergence test from its default value of \(0.33\). The quantity \(S\) is set to \(S=20\) initially and whenever \(J\) or \(P\) is updated, and it is reset to \(S=100\) on a step with \(\alpha \neq \bar\alpha\). Note that at \(m=1\), the convergence test (5.7) uses an old value for \(S\). Therefore, at the first nonlinear solver iteration, we make an additional test and stop the iteration if \(\|\delta_1\| < 0.33 \cdot 10^{-4}\) (since such a \(\delta_1\) is probably just noise and therefore not appropriate for use in evaluating \(R\)). We allow only a small number (default value 4) of nonlinear iterations. If convergence fails with \(J\) or \(P\) current, we are forced to reduce the step size \(h_n\), and we replace \(h_n\) by \(h_n \eta_{\mathrm{cf}}\) (by default \(\eta_{\mathrm{cf}} = 0.25\)). The integration is halted after a preset number (default value 10) of convergence failures. Both the maximum number of allowable nonlinear iterations and the maximum number of nonlinear convergence failures can be changed by the user from their default values.

When an iterative method is used to solve the linear system, to minimize the effect of linear iteration errors on the nonlinear and local integration error controls, we require the preconditioned linear residual to be small relative to the allowed error in the nonlinear iteration, i.e., \(\| P^{-1}(Jx+G) \| < 0.05 \cdot 0.33\). The safety factor \(0.05\) can be changed by the user.

When the Jacobian is stored using either the SUNMATRIX_DENSE or SUNMATRIX_BAND matrix objects, the Jacobian \(J\) defined in (5.6) can be either supplied by the user or IDA can compute \(J\) internally by difference quotients. In the latter case, we use the approximation

where \(U\) is the unit roundoff, \(h\) is the current step size, and
\(W_j\) is the error weight for the component \(y_j\) defined by
(5.4). We note that with sparse and user-supplied matrix objects,
the Jacobian *must* be supplied by a user routine.

In the case of an iterative linear solver, if a routine for \(Jv\) is not supplied, such products are approximated by

where the increment \(\sigma = 1/\|v\|\). As an option, the user can specify a constant factor that is inserted into this expression for \(\sigma\).

### 5.2.2.2. Local Error Test

During the course of integrating the system, IDA computes an estimate of the local truncation error, LTE, at the \(n\)-th time step, and requires this to satisfy the inequality

Asymptotically, LTE varies as \(h^{q+1}\) at step size \(h\) and order \(q\), as does the predictor-corrector difference \(\Delta_n \equiv y_n-y_{n(0)}\). Thus there is a constant \(C\) such that

and so the norm of LTE is estimated as \(|C| \cdot \|\Delta_n\|\). In addition, IDA requires that the error in the associated polynomial interpolant over the current step be bounded by 1 in norm. The leading term of the norm of this error is bounded by \(\bar{C} \|\Delta_n\|\) for another constant \(\bar{C}\). Thus the local error test in IDA is

A user option is available by which the algebraic components of the error vector are omitted from the test (5.8), if these have been so identified.

### 5.2.2.3. Step Size and Order Selection

In IDA, the local error test is tightly coupled with the logic for selecting the step size and order. First, there is an initial phase that is treated specially; for the first few steps, the step size is doubled and the order raised (from its initial value of 1) on every step, until (a) the local error test (5.8) fails, (b) the order is reduced (by the rules given below), or (c) the order reaches 5 (the maximum). For step and order selection on the general step, IDA uses a different set of local error estimates, based on the asymptotic behavior of the local error in the case of fixed step sizes. At each of the orders \(q'\) equal to \(q\), \(q-1\) (if \(q > 1\)), \(q-2\) (if \(q > 2\)), or \(q+1\) (if \(q < 5\)), there are constants \(C(q')\) such that the norm of the local truncation error at order \(q'\) satisfies

where \(\phi(k)\) is a modified divided difference of order \(k\) that is retained by IDA (and behaves asymptotically as \(h^k\)). Thus the local truncation errors are estimated as ELTE\((q') = C(q')\|\phi(q'+1)\|\) to select step sizes. But the choice of order in IDA is based on the requirement that the scaled derivative norms, \(\|h^k y^{(k)}\|\), are monotonically decreasing with \(k\), for \(k\) near \(q\). These norms are again estimated using the \(\phi(k)\), and in fact

The step/order selection begins with a test for monotonicity that is made even
*before* the local error test is performed. Namely, the order is reset to
\(q' = q-1\) if (a) \(q=2\) and \(T(1)\leq T(2)/2\), or (b) \(q
> 2\) and \(\max\{T(q-1),T(q-2)\} \leq T(q)\); otherwise \(q' = q\). Next
the local error test (5.8) is performed, and if it fails, the step is
redone at order \(q\leftarrow q'\) and a new step size \(h'\). The
latter is based on the \(h^{q+1}\) asymptotic behavior of
\(\text{ELTE}(q)\), and, with safety factors, is given by

The value of \(\eta\) is adjusted so that \(\eta_{\mathrm{min\_ef}} \leq \eta \leq \eta_{\mathrm{low}}\) (by default \(\eta_{\mathrm{min\_ef}} = 0.25\) and \(\eta_{\mathrm{low}} = 0.9\)) before setting \(h \leftarrow h' = \eta h\). If the local error test fails a second time, IDA uses \(\eta = \eta_{\mathrm{min\_ef}}\), and on the third and subsequent failures it uses \(q = 1\) and \(\eta = \eta_{\mathrm{min\_ef}}\). After 10 failures, IDA returns with a give-up message.

As soon as the local error test has passed, the step and order for the next step may be adjusted. No order change is made if \(q' = q-1\) from the prior test, if \(q = 5\), or if \(q\) was increased on the previous step. Otherwise, if the last \(q+1\) steps were taken at a constant order \(q < 5\) and a constant step size, IDA considers raising the order to \(q+1\). The logic is as follows:

If \(q = 1\), then set \(q = 2\) if \(T(2) < T(1)/2\).

If \(q > 1\) then

set \(q \leftarrow q-1\) if \(T(q-1) \leq \min\{T(q),T(q+1)\}\), else

set \(q \leftarrow q+1\) if \(T(q+1) < T(q)\), otherwise

leave \(q\) unchanged, in this case \(T(q-1) > T(q) \leq T(q+1)\)

In any case, the new step size \(h'\) is set much as before:

The value of \(\eta\) is adjusted such that

If \(\eta_{\mathrm{min\_fx}} < \eta < \eta_{\mathrm{max\_fx}}\), set \(\eta = 1\). The defaults are \(\eta_{\mathrm{min\_fx}} = 1\) and \(\eta_{\mathrm{max\_fx}} = 2\).

If \(\eta \geq \eta_{\mathrm{max\_fx}}\), the step size growth is restricted to \(\eta_{\mathrm{max\_fx}} \leq \eta \leq \eta_{\mathrm{max}}\) with \(\eta_{\mathrm{max}} = 2\) by default.

If \(\eta \leq \eta_{\mathrm{min\_fx}}\), the step size reduction is restricted to \(\eta_{\mathrm{min}} \leq \eta \leq \eta_{\mathrm{low}}\) with \(\eta_{\mathrm{min}} = 0.5\) and \(\eta_{\mathrm{low}} = 0.9\) by default.

Thus we do not increase the step size unless it can be doubled. If a step size reduction is called for, the step size will be cut by at least 10% and up to 50% for the next step. See [19] for details.

Finally \(h\) is set to \(h' = \eta h\) and \(|h|\) is restricted to \(h_{\text{min}} \leq |h| \leq h_{\text{max}}\) with the defaults \(h_{\text{min}} = 0.0\) and \(h_{\text{max}} = \infty\).

Normally, IDA takes steps until a user-defined output value \(t = t_{\text{out}}\) is overtaken, and then computes \(y(t_{\text{out}})\) by interpolation. However, a “one step” mode option is available, where control returns to the calling program after each step. There are also options to force IDA not to integrate past a given stopping point \(t = t_{\text{stop}}\).

### 5.2.2.4. Inequality Constraints

IDA permits the user to impose optional inequality constraints on individual components of the solution vector \(y\). Any of the following four constraints can be imposed: \(y_i > 0\), \(y_i < 0\), \(y_i \geq 0\), or \(y_i \leq 0\). The constraint satisfaction is tested after a successful nonlinear system solution. If any constraint fails, we declare a convergence failure of the nonlinear iteration and reduce the step size. Rather than cutting the step size by some arbitrary factor, IDA estimates a new step size \(h'\) using a linear approximation of the components in \(y\) that failed the constraint test (including a safety factor of \(0.9\) to cover the strict inequality case). These additional constraints are also imposed during the calculation of consistent initial conditions. If a step fails to satisfy the constraints repeatedly within a step attempt then the integration is halted and an error is returned. In this case the user may need to employ other strategies as discussed in §5.4.3.2 to satisfy the inequality constraints.

## 5.2.3. Preconditioning

When using a nonlinear solver that requires the solution of a linear system of
the form \(J \Delta y = - G\) (e.g., the default Newton iteration), IDA
makes repeated use of a linear solver. If this linear system solve is done with
one of the scaled preconditioned iterative linear solvers supplied with
SUNDIALS, these solvers are rarely successful if used without preconditioning;
it is generally necessary to precondition the system in order to obtain
acceptable efficiency. A system \(A x = b\) can be preconditioned on the
left, on the right, or on both sides. The Krylov method is then applied to a
system with the matrix \(P^{-1}A\), or \(AP^{-1}\), or \(P_L^{-1} A
P_R^{-1}\), instead of \(A\). However, within IDA, preconditioning is
allowed *only* on the left, so that the iterative method is applied to systems
\((P^{-1}J)\Delta y = -P^{-1}G\). Left preconditioning is required to make
the norm of the linear residual in the nonlinear iteration meaningful; in
general, \(\| J \Delta y + G \|\) is meaningless, since the weights used in
the WRMS-norm correspond to \(y\).

In order to improve the convergence of the Krylov iteration, the preconditioner matrix \(P\) should in some sense approximate the system matrix \(A\). Yet at the same time, in order to be cost-effective, the matrix \(P\) should be reasonably efficient to evaluate and solve. Finding a good point in this tradeoff between rapid convergence and low cost can be very difficult. Good choices are often problem-dependent (for example, see [22] for an extensive study of preconditioners for reaction-transport systems).

Typical preconditioners used with IDA are based on approximations to the iteration matrix of the systems involved; in other words, \(P \approx \dfrac{\partial F}{\partial y} + \alpha\dfrac{\partial F}{\partial \dot{y}}\), where \(\alpha\) is a scalar inversely proportional to the integration step size \(h\). Because the Krylov iteration occurs within a nonlinear solver iteration and further also within a time integration, and since each of these iterations has its own test for convergence, the preconditioner may use a very crude approximation, as long as it captures the dominant numerical feature(s) of the system. We have found that the combination of a preconditioner with the Newton-Krylov iteration, using even a fairly poor approximation to the Jacobian, can be surprisingly superior to using the same matrix without Krylov acceleration (i.e., a modified Newton iteration), as well as to using the Newton-Krylov method with no preconditioning.

## 5.2.4. Rootfinding

The IDA solver has been augmented to include a rootfinding feature. This means that, while integratnuming the Initial Value Problem (5.1), IDA can also find the roots of a set of user-defined functions \(g_i(t,y,\dot{y})\) that depend on \(t\), the solution vector \(y = y(t)\), and its \(t-\)derivative \(\dot{y}(t)\). The number of these root functions is arbitrary, and if more than one \(g_i\) is found to have a root in any given interval, the various root locations are found and reported in the order that they occur on the \(t\) axis, in the direction of integration.

Generally, this rootfinding feature finds only roots of odd multiplicity, corresponding to changes in sign of \(g_i(t,y(t),\dot{y}(t))\), denoted \(g_i(t)\) for short. If a user root function has a root of even multiplicity (no sign change), it will probably be missed by IDA. If such a root is desired, the user should reformulate the root function so that it changes sign at the desired root.

The basic scheme used is to check for sign changes of any \(g_i(t)\) over each time step taken, and then (when a sign change is found) to home in on the root (or roots) with a modified secant method [66]. In addition, each time \(g\) is computed, IDA checks to see if \(g_i(t) = 0\) exactly, and if so it reports this as a root. However, if an exact zero of any \(g_i\) is found at a point \(t\), IDA computes \(g\) at \(t + \delta\) for a small increment \(\delta\), slightly further in the direction of integration, and if any \(g_i(t + \delta)=0\) also, IDA stops and reports an error. This way, each time IDA takes a time step, it is guaranteed that the values of all \(g_i\) are nonzero at some past value of \(t\), beyond which a search for roots is to be done.

At any given time in the course of the time-stepping, after suitable checking and adjusting has been done, IDA has an interval \((t_{lo},t_{hi}]\) in which roots of the \(g_i(t)\) are to be sought, such that \(t_{hi}\) is further ahead in the direction of integration, and all \(g_i(t_{lo}) \neq 0\). The endpoint \(t_{hi}\) is either \(t_n\), the end of the time step last taken, or the next requested output time \(t_{\text{out}}\) if this comes sooner. The endpoint \(t_{lo}\) is either \(t_{n-1}\), or the last output time \(t_{\text{out}}\) (if this occurred within the last step), or the last root location (if a root was just located within this step), possibly adjusted slightly toward \(t_n\) if an exact zero was found. The algorithm checks \(g\) at \(t_{hi}\) for zeros and for sign changes in \((t_{lo},t_{hi})\). If no sign changes are found, then either a root is reported (if some \(g_i(t_{hi}) = 0\)) or we proceed to the next time interval (starting at \(t_{hi}\)). If one or more sign changes were found, then a loop is entered to locate the root to within a rather tight tolerance, given by

Whenever sign changes are seen in two or more root functions, the one deemed most likely to have its root occur first is the one with the largest value of \(|g_i(t_{hi})|/|g_i(t_{hi}) - g_i(t_{lo})|\), corresponding to the closest to \(t_{lo}\) of the secant method values. At each pass through the loop, a new value \(t_{mid}\) is set, strictly within the search interval, and the values of \(g_i(t_{mid})\) are checked. Then either \(t_{lo}\) or \(t_{hi}\) is reset to \(t_{mid}\) according to which subinterval is found to have the sign change. If there is none in \((t_{lo},t_{mid})\) but some \(g_i(t_{mid}) = 0\), then that root is reported. The loop continues until \(|t_{hi}-t_{lo}| < \tau\), and then the reported root location is \(t_{hi}\).

In the loop to locate the root of \(g_i(t)\), the formula for \(t_{mid}\) is

where \(\alpha\) a weight parameter. On the first two passes through the loop, \(\alpha\) is set to \(1\), making \(t_{mid}\) the secant method value. Thereafter, \(\alpha\) is reset according to the side of the subinterval (low vs high, i.e. toward \(t_{lo}\) vs toward \(t_{hi}\)) in which the sign change was found in the previous two passes. If the two sides were opposite, \(\alpha\) is set to 1. If the two sides were the same, \(\alpha\) is halved (if on the low side) or doubled (if on the high side). The value of \(t_{mid}\) is closer to \(t_{lo}\) when \(\alpha < 1\) and closer to \(t_{hi}\) when \(\alpha > 1\). If the above value of \(t_{mid}\) is within \(\tau/2\) of \(t_{lo}\) or \(t_{hi}\), it is adjusted inward, such that its fractional distance from the endpoint (relative to the interval size) is between 0.1 and 0.5 (0.5 being the midpoint), and the actual distance from the endpoint is at least \(\tau/2\).