# 4.2. Mathematical Considerations

CVODE solves ODE initial value problems (IVPs) in real $$N$$-space, which we write in the abstract form

(4.1)$\dot{y} = f(t,y) \, ,\quad y(t_0) = y_0 \,$

where $$y \in \mathbb{R}^N$$ and $$f: \mathbb{R} \times \mathbb{R}^N \rightarrow \mathbb{R}^N$$. Here we use $$\dot{y}$$ to denote $$\mathrm dy/\mathrm dt$$. While we use $$t$$ to denote the independent variable, and usually this is time, it certainly need not be. CVODE solves both stiff and nonstiff systems. Roughly speaking, stiffness is characterized by the presence of at least one rapidly damped mode, whose time constant is small compared to the time scale of the solution itself.

For problems (4.1) where the analytical solution $$y(t)$$ satisfies an implicit constraint $$g(t,y)=0$$ (including the initial condition, $$g(t_0,y_0)=0$$) for $$g(t,y): \mathbb{R} \times \mathbb{R}^N \rightarrow \mathbb{R}^{M}$$ with $$M<N$$, CVODE may be configured to explicitly enforce these constraints via solving the modified problem

(4.2)\begin{split}\begin{aligned} \dot{y} &= f(t,y) \, ,\quad y(t_0) = y_0 \, ,\\ 0 &= g(t,y). \end{aligned}\end{split}

## 4.2.1. IVP solution

The methods used in CVODE are variable-order, variable-step multistep methods, based on formulas of the form

(4.3)$\sum_{i = 0}^{K_1} \alpha_{n,i} y^{n-i} + h_n \sum_{i = 0}^{K_2} \beta_{n,i} {\dot{y}}^{n-i} = 0 \, .$

Here the $$y^n$$ are computed approximations to $$y(t_n)$$, and $$h_n = t_n - t_{n-1}$$ is the step size. The user of CVODE must choose appropriately one of two multistep methods. For nonstiff problems, CVODE includes the Adams-Moulton formulas, characterized by $$K_1 = 1$$ and $$K_2 = q-1$$ above, where the order $$q$$ varies between $$1$$ and $$12$$. For stiff problems, CVODE includes the Backward Differentiation Formulas (BDF) in so-called fixed-leading coefficient (FLC) form, given by $$K_1 = q$$ and $$K_2 = 0$$, with order $$q$$ varying between $$1$$ and $$5$$. The coefficients are uniquely determined by the method type, its order, the recent history of the step sizes, and the normalization $$\alpha_{n,0} = -1$$. See [28] and [70].

For either choice of formula, a nonlinear system must be solved (approximately) at each integration step. This nonlinear system can be formulated as either a rootfinding problem

(4.4)$F(y^n) \equiv y^n - h_n \beta_{n,0} f(t_n,y^n) - a_n = 0 \, ,$

or as a fixed-point problem

(4.5)$G(y^n) \equiv h_n \beta_{n,0} f(t_n,y^n) + a_n = y^n \, .$

where $$a_n\equiv\sum_{i>0}(\alpha_{n,i}y^{n-i}+h_n\beta_{n,i} {\dot{y}}^{n-i})$$.

In the process of controlling errors at various levels, CVODE uses a weighted root-mean-square norm, denoted $$|\cdot|_{\text{WRMS}}$$, for all error-like quantities. The multiplicative weights used are based on the current solution and on the relative and absolute tolerances input by the user, namely

(4.6)$W_i = 1 / [\text{rtol} \cdot |y_i| + \text{atol}_i ] \, .$

Because $$1/W_i$$ represents a tolerance in the component $$y_i$$, a vector whose norm is 1 is regarded as “small.” For brevity, we will usually drop the subscript WRMS on norms in what follows.

### 4.2.1.1. Nonlinear Solve

CVODE provides several nonlinear solver choices as well as the option of using a user-defined nonlinear solver (see §12). By default CVODE solves (4.4) with a Newton iteration which requires the solution of linear systems

(4.7)$M [y^{n(m+1)} - y^{n(m)}] = -F(y^{n(m)}) \, ,$

in which

(4.8)$M \approx I - \gamma J \, , \quad J = \partial f / \partial y \, , \quad \mbox{and} \quad \gamma = h_n \beta_{n,0} \, .$

The exact variation of the Newton iteration depends on the choice of linear solver and is discussed below and in §12.7. For nonstiff systems, a fixed-point iteration (previously referred to as a functional iteration in this guide) solving (4.5) is also available. This involves evaluations of $$f$$ only and can (optionally) use Anderson’s method [11, 49, 83, 108] to accelerate convergence (see §12.8 for more details). For any nonlinear solver, the initial guess for the iteration is a predicted value $$y^{n(0)}$$ computed explicitly from the available history data.

For nonlinear solvers that require the solution of the linear system (4.7) (e.g., the default Newton iteration), CVODE provides several linear solver choices, including the option of a user-supplied linear solver module (see §11). The linear solver modules distributed with SUNDIALS are organized in two families, a direct family comprising direct linear solvers for dense, banded, or sparse matrices, and a spils family comprising scaled preconditioned iterative (Krylov) linear solvers. The methods offered through these modules are as follows:

• dense direct solvers, including an internal implementation, an interface to BLAS/LAPACK, an interface to MAGMA [103] and an interface to the oneMKL library [3],

• band direct solvers, including an internal implementation or an interface to BLAS/LAPACK,

• sparse direct solver interfaces to various libraries, including KLU [4, 39], SuperLU_MT [9, 41, 80], SuperLU_Dist [8, 53, 81, 82], and cuSPARSE [7],

• SPGMR, a scaled preconditioned GMRES (Generalized Minimal Residual method) solver,

• SPFGMR, a scaled preconditioned FGMRES (Flexible Generalized Minimal Residual method) solver,

• SPBCG, a scaled preconditioned Bi-CGStab (Bi-Conjugate Gradient Stable method) solver,

• SPTFQMR, a scaled preconditioned TFQMR (Transpose-Free Quasi-Minimal Residual method) solver, or

• PCG, a scaled preconditioned CG (Conjugate Gradient method) solver.

For large stiff systems, where direct methods are often not feasible, the combination of a BDF integrator and a preconditioned Krylov method yields a powerful tool because it combines established methods for stiff integration, nonlinear iteration, and Krylov (linear) iteration with a problem-specific treatment of the dominant source of stiffness, in the form of the user-supplied preconditioner matrix [22].

In addition, CVODE also provides a linear solver module which only uses a diagonal approximation of the Jacobian matrix.

In the case of a matrix-based linear solver, the default Newton iteration is a Modified Newton iteration, in that the iteration matrix $$M$$ is fixed throughout the nonlinear iterations. However, in the case that a matrix-free iterative linear solver is used, the default Newton iteration is an Inexact Newton iteration, in which $$M$$ is applied in a matrix-free manner, with matrix-vector products $$Jv$$ obtained by either difference quotients or a user-supplied routine. With the default Newton iteration, the matrix $$M$$ and preconditioner matrix $$P$$ are updated as infrequently as possible to balance the high costs of matrix operations against other costs. Specifically, this matrix update occurs when:

• starting the problem,

• more than 20 steps have been taken since the last update,

• the value $$\bar{\gamma}$$ of $$\gamma$$ at the last update satisfies $$|\gamma/\bar{\gamma} - 1| > 0.3$$,

• a non-fatal convergence failure just occurred, or

• an error test failure just occurred.

When forced by a convergence failure, an update of $$M$$ or $$P$$ may or may not involve a reevaluation of $$J$$ (in $$M$$) or of Jacobian data (in $$P$$), depending on whether Jacobian error was the likely cause of the failure. More generally, the decision is made to reevaluate $$J$$ (or instruct the user to reevaluate Jacobian data in $$P$$) when:

• starting the problem,

• more than 50 steps have been taken since the last evaluation,

• a convergence failure occurred with an outdated matrix, and the value $$\bar{\gamma}$$ of $$\gamma$$ at the last update satisfies $$|\gamma/\bar{\gamma} - 1| < 0.2$$, or

• a convergence failure occurred that forced a step size reduction.

The default stopping test for nonlinear solver iterations is related to the subsequent local error test, with the goal of keeping the nonlinear iteration errors from interfering with local error control. As described below, the final computed value $$y^{n(m)}$$ will have to satisfy a local error test $$\|y^{n(m)} - y^{n(0)}\| \leq \epsilon$$. Letting $$y^n$$ denote the exact solution of (4.4), we want to ensure that the iteration error $$y^n - y^{n(m)}$$ is small relative to $$\epsilon$$, specifically that it is less than $$0.1 \epsilon$$. (The safety factor $$0.1$$ can be changed by the user.) For this, we also estimate the linear convergence rate constant $$R$$ as follows. We initialize $$R$$ to 1, and reset $$R = 1$$ when $$M$$ or $$P$$ is updated. After computing a correction $$\delta_m = y^{n(m)}-y^{n(m-1)}$$, we update $$R$$ if $$m > 1$$ as

$R \leftarrow \max\{0.3R , \|\delta_m\| / \|\delta_{m-1}\| \} \, .$

Now we use the estimate

$\| y^n - y^{n(m)} \| \approx \| y^{n(m+1)} - y^{n(m)} \| \approx R \| y^{n(m)} - y^{n(m-1)} \| = R \|\delta_m \| \, .$

Therefore the convergence (stopping) test is

$R \|\delta_m\| < 0.1 \epsilon \, .$

We allow at most 3 iterations (but this limit can be changed by the user). We also declare the iteration diverged if any $$\|\delta_m\| / \|\delta_{m-1}\| > 2$$ with $$m > 1$$. If convergence fails with $$J$$ or $$P$$ current, we are forced to reduce the step size, and we replace $$h_n$$ by $$h_n = \eta_{\mathrm{cf}} * h_n$$ where the default is $$\eta_{\mathrm{cf}} = 0.25$$. The integration is halted after a preset number of convergence failures; the default value of this limit is 10, but this can be changed by the user.

When an iterative method is used to solve the linear system, its errors must also be controlled, and this also involves the local error test constant. The linear iteration error in the solution vector $$\delta_m$$ is approximated by the preconditioned residual vector. Thus to ensure (or attempt to ensure) that the linear iteration errors do not interfere with the nonlinear error and local integration error controls, we require that the norm of the preconditioned residual be less than $$0.05 \cdot (0.1 \epsilon)$$.

When the Jacobian is stored using either the SUNMATRIX_DENSE or SUNMATRIX_BAND matrix objects, the Jacobian may be supplied by a user routine, or approximated by difference quotients, at the user’s option. In the latter case, we use the usual approximation

$J_{ij} = [f_i(t,y+\sigma_j e_j) - f_i(t,y)]/\sigma_j \, .$

The increments $$\sigma_j$$ are given by

$\sigma_j = \max\left\{\sqrt{U} \; |y_j| , \sigma_0 / W_j \right\} \, ,$

where $$U$$ is the unit roundoff, $$\sigma_0$$ is a dimensionless value, and $$W_j$$ is the error weight defined in (4.6). In the dense case, this scheme requires $$N$$ evaluations of $$f$$, one for each column of $$J$$. In the band case, the columns of $$J$$ are computed in groups, by the Curtis-Powell-Reid algorithm, with the number of $$f$$ evaluations equal to the bandwidth.

We note that with sparse and user-supplied SUNMatrix objects, the Jacobian must be supplied by a user routine.

In the case of a Krylov method, preconditioning may be used on the left, on the right, or both, with user-supplied routines for the preconditioning setup and solve operations, and optionally also for the required matrix-vector products $$Jv$$. If a routine for $$Jv$$ is not supplied, these products are computed as

(4.9)$Jv = [f(t,y+\sigma v) - f(t,y)]/\sigma \, .$

The increment $$\sigma$$ is $$1/\|v\|$$, so that $$\sigma v$$ has norm 1.

### 4.2.1.2. Local Error Test

A critical part of CVODE — making it an ODE “solver” rather than just an ODE method, is its control of local error. At every step, the local error is estimated and required to satisfy tolerance conditions, and the step is redone with reduced step size whenever that error test fails. As with any linear multistep method, the local truncation error LTE, at order $$q$$ and step size $$h$$, satisfies an asymptotic relation

$\mbox{LTE} = C h^{q+1} y^{(q+1)} + O(h^{q+2})$

for some constant $$C$$, under mild assumptions on the step sizes. A similar relation holds for the error in the predictor $$y^{n(0)}$$. These are combined to get a relation

$\mbox{LTE} = C' [y^n - y^{n(0)}] + O(h^{q+2}) \, .$

The local error test is simply $$|\mbox{LTE}| \leq 1$$. Using the above, it is performed on the predictor-corrector difference $$\Delta_n \equiv y^{n(m)} - y^{n(0)}$$ (with $$y^{n(m)}$$ the final iterate computed), and takes the form

$\|\Delta_n\| \leq \epsilon \equiv 1/|C'| \, .$

### 4.2.1.3. Step Size and Order Selection

If the local error test passes, the step is considered successful. If it fails, the step is rejected and a new step size $$h'$$ is computed based on the asymptotic behavior of the local error, namely by the equation

$(h'/h)^{q+1} \|\Delta_n\| = \epsilon/6 \, .$

Here 1/6 is a safety factor. A new attempt at the step is made, and the error test repeated. If it fails three times, the order $$q$$ is reset to 1 (if $$q > 1$$), or the step is restarted from scratch (if $$q = 1$$). The ratio $$\eta = h'/h$$ is limited above to $$\eta_{\mathrm{max\_ef}}$$ (default 0.2) after two error test failures, and limited below to $$\eta_{\mathrm{min\_ef}}$$ (default 0.1) after three. After seven failures, CVODE returns to the user with a give-up message.

In addition to adjusting the step size to meet the local error test, CVODE periodically adjusts the order, with the goal of maximizing the step size. The integration starts out at order 1 and varies the order dynamically after that. The basic idea is to pick the order $$q$$ for which a polynomial of order $$q$$ best fits the discrete data involved in the multistep method. However, if either a convergence failure or an error test failure occurred on the step just completed, no change in step size or order is done. At the current order $$q$$, selecting a new step size is done exactly as when the error test fails, giving a tentative step size ratio

$h'/h = (\epsilon / 6 \|\Delta_n\| )^{1/(q+1)} \equiv \eta_q \, .$

We consider changing order only after taking $$q+1$$ steps at order $$q$$, and then we consider only orders $$q' = q - 1$$ (if $$q > 1$$) or $$q' = q + 1$$ (if $$q < 5$$). The local truncation error at order $$q'$$ is estimated using the history data. Then a tentative step size ratio is computed on the basis that this error, LTE$$(q')$$, behaves asymptotically as $$h^{q'+1}$$. With safety factors of 1/6 and 1/10 respectively, these ratios are:

$h'/h = [1 / 6 \|\mbox{LTE}(q-1)\| ]^{1/q} \equiv \eta_{q-1}$

and

$h'/h = [1 / 10 \|\mbox{LTE}(q+1)\| ]^{1/(q+2)} \equiv \eta_{q+1} \, .$

The new order and step size are then set according to

$\eta = \max\{\eta_{q-1},\eta_q,\eta_{q+1}\} \, ,$

with $$q'$$ set to the index achieving the above maximum. However, if we find that $$\eta < \eta_{\mathrm{max\_fx}}$$ (default 1.5), we do not bother with the change. Also, $$\eta$$ is always limited to $$\eta_{\mathrm{max\_gs}}$$ (default 10), except on the first step, when it is limited to $$\eta_{\mathrm{max\_fs}} = 10^4$$.

The various algorithmic features of CVODE described above, as inherited from VODE and VODPK, are documented in [21, 27, 63]. They are also summarized in [64].

Normally, CVODE takes steps until a user-defined output value $$t = t_{\text{out}}$$ is overtaken, and then it computes $$y(t_{\text{out}})$$ by interpolation. However, a “one step” mode option is available, where control returns to the calling program after each step. There are also options to force CVODE not to integrate past a given stopping point $$t = t_{\text{stop}}$$.

### 4.2.1.4. Inequality Constraints

CVODE permits the user to impose optional inequality constraints on individual components of the solution vector $$y$$. Any of the following four constraints can be imposed: $$y_i > 0$$, $$y_i < 0$$, $$y_i \geq 0$$, or $$y_i \leq 0$$. The constraint satisfaction is tested after a successful nonlinear system solution. If any constraint fails, we declare a convergence failure of the Newton iteration and reduce the step size. Rather than cutting the step size by some arbitrary factor, CVODE estimates a new step size $$h'$$ using a linear approximation of the components in $$y$$ that failed the constraint test (including a safety factor of $$0.9$$ to cover the strict inequality case). If a step fails to satisfy the constraints repeatedly within a step attempt or fails with the minimum step size then the integration is halted and an error is returned. In this case the user may need to employ other strategies as discussed in §4.4.5.2 to satisfy the inequality constraints.

## 4.2.2. IVPs with constraints

For IVPs whose analytical solutions implicitly satisfy constraints as in (4.2), CVODE ensures that the solution satisfies the constraint equation by projecting a successfully computed time step onto the invariant manifold. As discussed in [46] and [99], this approach reduces the error in the solution and retains the order of convergence of the numerical method. Therefore, in an attempt to advance the solution to a new point in time (i.e., taking a new integration step), CVODE performs the following operations:

1. predict solution

2. solve nonlinear system and correct solution

3. project solution

4. test error

5. select order and step size for next step

and includes several recovery attempts in case there are convergence failures (or difficulties) in the nonlinear solver or in the projection step, or if the solution fails to satisfy the error test. Note that at this time projection is only supported with BDF methods and the projection function must be user-defined. See §4.4.5.8 and CVodeSetProjFn() for more information on providing a projection function to CVODE.

When using a coordinate projection method the solution $$y_n$$ is obtained by projecting (orthogonally or otherwise) the solution $$\tilde{y}_n$$ from step 2 above onto the manifold given by the constraint. As such $$y_n$$ is computed as the solution of the nonlinear constrained least squares problem

(4.10)$\begin{split}\begin{split} \text{minimize} &\quad \| y_n - \tilde{y}_n \| \\ \text{subject to} &\quad g(t_n,y_n) = 0. \end{split}\end{split}$

The solution of (4.10) can be computed iteratively with a Newton method. Given an initial guess $$y_n^{(0)}$$ the iterations are computed as

$y_n^{(i+1)} = y_n^{(i)} + \delta y_n^{(i)}$

where the increment $$\delta y_n^{(i)}$$ is the solution of the least-norm problem

(4.11)$\begin{split}\begin{split} \text{minimize} &\quad \| \delta y_n^{(i)} \| \\ \text{subject to} &\quad G(t_n,y_n^{(i)}) \; \delta y_n^{(i)} = -g(t_n,y_n^{(i)}) \end{split}\end{split}$

where $$G(t,y) = \partial g(t,y) / \partial y$$.

If the projected solution satisfies the error test then the step is accepted and the correction to the unprojected solution, $$\Delta_p = y_n - \tilde{y}_n$$, is used to update the Nordsieck history array for the next step.

## 4.2.3. Preconditioning

When using a nonlinear solver that requires the solution of the linear system, e.g., the default Newton iteration (§12.7), CVODE makes repeated use of a linear solver to solve linear systems of the form $$M x = - r$$, where $$x$$ is a correction vector and $$r$$ is a residual vector. If this linear system solve is done with one of the scaled preconditioned iterative linear solvers supplied with SUNDIALS, these solvers are rarely successful if used without preconditioning; it is generally necessary to precondition the system in order to obtain acceptable efficiency. A system $$A x = b$$ can be preconditioned on the left, as $$(P^{-1}A) x = P^{-1} b$$; on the right, as $$(A P^{-1}) P x = b$$; or on both sides, as $$(P_L^{-1} A P_R^{-1}) P_R x = P_L^{-1}b$$. The Krylov method is then applied to a system with the matrix $$P^{-1}A$$, or $$AP^{-1}$$, or $$P_L^{-1} A P_R^{-1}$$, instead of $$A$$. In order to improve the convergence of the Krylov iteration, the preconditioner matrix $$P$$, or the product $$P_L P_R$$ in the last case, should in some sense approximate the system matrix $$A$$. Yet at the same time, in order to be cost-effective, the matrix $$P$$, or matrices $$P_L$$ and $$P_R$$, should be reasonably efficient to evaluate and solve. Finding a good point in this tradeoff between rapid convergence and low cost can be very difficult. Good choices are often problem-dependent (for example, see [22] for an extensive study of preconditioners for reaction-transport systems).

Most of the iterative linear solvers supplied with SUNDIALS allow for preconditioning either side, or on both sides, although we know of no situation where preconditioning on both sides is clearly superior to preconditioning on one side only (with the product $$P_L P_R$$). Moreover, for a given preconditioner matrix, the merits of left vs. right preconditioning are unclear in general, and the user should experiment with both choices. Performance will differ because the inverse of the left preconditioner is included in the linear system residual whose norm is being tested in the Krylov algorithm. As a rule, however, if the preconditioner is the product of two matrices, we recommend that preconditioning be done either on the left only or the right only, rather than using one factor on each side.

Typical preconditioners used with CVODE are based on approximations to the system Jacobian, $$J = \partial f / \partial y$$. Since the matrix involved is $$M = I - \gamma J$$, any approximation $$\bar{J}$$ to $$J$$ yields a matrix that is of potential use as a preconditioner, namely $$P = I - \gamma \bar{J}$$. Because the Krylov iteration occurs within a nonlinear solver iteration and further also within a time integration, and since each of these iterations has its own test for convergence, the preconditioner may use a very crude approximation, as long as it captures the dominant numerical feature(s) of the system. We have found that the combination of a preconditioner with the Newton-Krylov iteration, using even a fairly poor approximation to the Jacobian, can be surprisingly superior to using the same matrix without Krylov acceleration (i.e., a modified Newton iteration), as well as to using the Newton-Krylov method with no preconditioning.

## 4.2.4. BDF stability limit detection

CVODE includes an algorithm, STALD (STAbility Limit Detection), which provides protection against potentially unstable behavior of the BDF multistep integration methods in certain situations, as described below.

When the BDF option is selected, CVODE uses Backward Differentiation Formula methods of orders 1 to 5. At order 1 or 2, the BDF method is A-stable, meaning that for any complex constant $$\lambda$$ in the open left half-plane, the method is unconditionally stable (for any step size) for the standard scalar model problem $$\dot{y} = \lambda y$$. For an ODE system, this means that, roughly speaking, as long as all modes in the system are stable, the method is also stable for any choice of step size, at least in the sense of a local linear stability analysis.

At orders 3 to 5, the BDF methods are not A-stable, although they are stiffly stable. In each case, in order for the method to be stable at step size $$h$$ on the scalar model problem, the product $$h\lambda$$ must lie within a region of absolute stability. That region excludes a portion of the left half-plane that is concentrated near the imaginary axis. The size of that region of instability grows as the order increases from 3 to 5. What this means is that, when running BDF at any of these orders, if an eigenvalue $$\lambda$$ of the system lies close enough to the imaginary axis, the step sizes $$h$$ for which the method is stable are limited (at least according to the linear stability theory) to a set that prevents $$h\lambda$$ from leaving the stability region. The meaning of close enough depends on the order. At order 3, the unstable region is much narrower than at order 5, so the potential for unstable behavior grows with order.

System eigenvalues that are likely to run into this instability are ones that correspond to weakly damped oscillations. A pure undamped oscillation corresponds to an eigenvalue on the imaginary axis. Problems with modes of that kind call for different considerations, since the oscillation generally must be followed by the solver, and this requires step sizes ($$h \sim 1/\nu$$, where $$\nu$$ is the frequency) that are stable for BDF anyway. But for a weakly damped oscillatory mode, the oscillation in the solution is eventually damped to the noise level, and at that time it is important that the solver not be restricted to step sizes on the order of $$1/\nu$$. It is in this situation that the new option may be of great value.

In terms of partial differential equations, the typical problems for which the stability limit detection option is appropriate are ODE systems resulting from semi-discretized PDEs (i.e., PDEs discretized in space) with advection and diffusion, but with advection dominating over diffusion. Diffusion alone produces pure decay modes, while advection tends to produce undamped oscillatory modes. A mix of the two with advection dominant will have weakly damped oscillatory modes.

The STALD algorithm attempts to detect, in a direct manner, the presence of a stability region boundary that is limiting the step sizes in the presence of a weakly damped oscillation [61]. The algorithm supplements (but differs greatly from) the existing algorithms in CVODE for choosing step size and order based on estimated local truncation errors. The STALD algorithm works directly with history data that is readily available in CVODE. If it concludes that the step size is in fact stability-limited, it dictates a reduction in the method order, regardless of the outcome of the error-based algorithm. The STALD algorithm has been tested in combination with the VODE solver on linear advection-dominated advection-diffusion problems [62], where it works well. The implementation in CVODE has been successfully tested on linear and nonlinear advection-diffusion problems, among others.

This stability limit detection option adds some computational overhead to the CVODE solution. (In timing tests, these overhead costs have ranged from 2% to 7% of the total, depending on the size and complexity of the problem, with lower relative costs for larger problems.) Therefore, it should be activated only when there is reasonable expectation of modes in the user’s system for which it is appropriate. In particular, if a CVODE solution with this option turned off appears to take an inordinately large number of steps at orders 3-5 for no apparent reason in terms of the solution time scale, then there is a good chance that step sizes are being limited by stability, and that turning on the option will improve the efficiency of the solution.

## 4.2.5. Rootfinding

The CVODE solver has been augmented to include a rootfinding feature. This means that, while integrating the Initial Value Problem (4.1), CVODE can also find the roots of a set of user-defined functions $$g_i(t,y)$$ that depend both on $$t$$ and on the solution vector $$y = y(t)$$. The number of these root functions is arbitrary, and if more than one $$g_i$$ is found to have a root in any given interval, the various root locations are found and reported in the order that they occur on the $$t$$ axis, in the direction of integration.

Generally, this rootfinding feature finds only roots of odd multiplicity, corresponding to changes in sign of $$g_i(t,y(t))$$, denoted $$g_i(t)$$ for short. If a user root function has a root of even multiplicity (no sign change), it will probably be missed by CVODE. If such a root is desired, the user should reformulate the root function so that it changes sign at the desired root.

The basic scheme used is to check for sign changes of any $$g_i(t)$$ over each time step taken, and then (when a sign change is found) to hone in on the root(s) with a modified secant method [60]. In addition, each time $$g$$ is computed, CVODE checks to see if $$g_i(t) = 0$$ exactly, and if so it reports this as a root. However, if an exact zero of any $$g_i$$ is found at a point $$t$$, CVODE computes $$g$$ at $$t + \delta$$ for a small increment $$\delta$$, slightly further in the direction of integration, and if any $$g_i(t + \delta)=0$$ also, CVODE stops and reports an error. This way, each time CVODE takes a time step, it is guaranteed that the values of all $$g_i$$ are nonzero at some past value of $$t$$, beyond which a search for roots is to be done.

At any given time in the course of the time-stepping, after suitable checking and adjusting has been done, CVODE has an interval $$(t_{lo},t_{hi}]$$ in which roots of the $$g_i(t)$$ are to be sought, such that $$t_{hi}$$ is further ahead in the direction of integration, and all $$g_i(t_{lo}) \neq 0$$. The endpoint $$t_{hi}$$ is either $$t_n$$, the end of the time step last taken, or the next requested output time $$t_{\text{out}}$$ if this comes sooner. The endpoint $$t_{lo}$$ is either $$t_{n-1}$$, the last output time $$t_{\text{out}}$$ (if this occurred within the last step), or the last root location (if a root was just located within this step), possibly adjusted slightly toward $$t_n$$ if an exact zero was found. The algorithm checks $$g_i$$ at $$t_{hi}$$ for zeros and for sign changes in $$(t_{lo},t_{hi})$$. If no sign changes were found, then either a root is reported (if some $$g_i(t_{hi}) = 0$$) or we proceed to the next time interval (starting at $$t_{hi}$$). If one or more sign changes were found, then a loop is entered to locate the root to within a rather tight tolerance, given by

$\tau = 100 * U * (|t_n| + |h|)~~~ (U = \mbox{unit roundoff}) ~.$

Whenever sign changes are seen in two or more root functions, the one deemed most likely to have its root occur first is the one with the largest value of $$|g_i(t_{hi})|/|g_i(t_{hi}) - g_i(t_{lo})|$$, corresponding to the closest to $$t_{lo}$$ of the secant method values. At each pass through the loop, a new value $$t_{mid}$$ is set, strictly within the search interval, and the values of $$g_i(t_{mid})$$ are checked. Then either $$t_{lo}$$ or $$t_{hi}$$ is reset to $$t_{mid}$$ according to which subinterval is found to include the sign change. If there is none in $$(t_{lo},t_{mid})$$ but some $$g_i(t_{mid}) = 0$$, then that root is reported. The loop continues until $$|t_{hi}-t_{lo}| < \tau$$, and then the reported root location is $$t_{hi}$$.

In the loop to locate the root of $$g_i(t)$$, the formula for $$t_{mid}$$ is

$t_{mid} = t_{hi} - (t_{hi} - t_{lo}) g_i(t_{hi}) / [g_i(t_{hi}) - \alpha g_i(t_{lo})] ~,$

where $$\alpha$$ is a weight parameter. On the first two passes through the loop, $$\alpha$$ is set to $$1$$, making $$t_{mid}$$ the secant method value. Thereafter, $$\alpha$$ is reset according to the side of the subinterval (low vs. high, i.e., toward $$t_{lo}$$ vs. toward $$t_{hi}$$) in which the sign change was found in the previous two passes. If the two sides were opposite, $$\alpha$$ is set to 1. If the two sides were the same, $$\alpha$$ is halved (if on the low side) or doubled (if on the high side). The value of $$t_{mid}$$ is closer to $$t_{lo}$$ when $$\alpha < 1$$ and closer to $$t_{hi}$$ when $$\alpha > 1$$. If the above value of $$t_{mid}$$ is within $$\tau/2$$ of $$t_{lo}$$ or $$t_{hi}$$, it is adjusted inward, such that its fractional distance from the endpoint (relative to the interval size) is between .1 and .5 (.5 being the midpoint), and the actual distance from the endpoint is at least $$\tau/2$$.