9.9. The NVECTOR_SERIAL Module
The serial implementation of the NVECTOR module provided with
SUNDIALS, NVECTOR_SERIAL, defines the content field of an
N_Vector
to be a structure containing the length of the vector, a
pointer to the beginning of a contiguous data array, and a boolean
flag own_data which specifies the ownership of data.
struct _N_VectorContent_Serial {
sunindextype length;
booleantype own_data;
realtype *data;
};
The header file to be included when using this module is nvector_serial.h
.
The installed module library to link to is
libsundials_nvecserial.lib
where .lib
is typically .so
for
shared libraries and .a
for static libraries.
9.9.1. NVECTOR_SERIAL accessor macros
The following five macros are provided to access the content of an
NVECTOR_SERIAL vector. The suffix _S
in the names denotes the serial
version.

NV_CONTENT_S(v)
This macro gives access to the contents of the serial vector
N_Vector
v.The assignment
v_cont = NV_CONTENT_S(v)
setsv_cont
to be a pointer to the serialN_Vector
content structure.Implementation:
#define NV_CONTENT_S(v) ( (N_VectorContent_Serial)(v>content) )

NV_OWN_DATA_S(v)
Access the own_data component of the serial
N_Vector
v.Implementation:
#define NV_OWN_DATA_S(v) ( NV_CONTENT_S(v)>own_data )

NV_DATA_S(v)
The assignment
v_data = NV_DATA_S(v)
setsv_data
to be a pointer to the first component of the data for theN_Vector
v
.Similarly, the assignment
NV_DATA_S(v) = v_data
sets the component array ofv
to bev_data
by storing the pointerv_data
.Implementation:
#define NV_DATA_S(v) ( NV_CONTENT_S(v)>data )

NV_LENGTH_S(v)
Access the length component of the serial
N_Vector
v.The assignment
v_len = NV_LENGTH_S(v)
setsv_len
to be the length ofv
. On the other hand, the callNV_LENGTH_S(v) = len_v
sets the length ofv
to belen_v
.Implementation:
#define NV_LENGTH_S(v) ( NV_CONTENT_S(v)>length )

NV_Ith_S(v, i)
This macro gives access to the individual components of the data array of an
N_Vector
, using standard 0based C indexing.The assignment
r = NV_Ith_S(v,i)
setsr
to be the value of thei
th component ofv
.The assignment
NV_Ith_S(v,i) = r
sets the value of thei
th component ofv
to ber
.Here
i
ranges from 0 to \(n1\) for a vector of length \(n\).Implementation:
#define NV_Ith_S(v,i) ( NV_DATA_S(v)[i] )
9.9.2. NVECTOR_SERIAL functions
The NVECTOR_SERIAL module defines serial implementations of all vector
operations listed in §9.2.1,
§9.2.2, §9.2.3, and
§9.2.4. Their names are obtained from those in
those sections by appending the suffix _Serial
(e.g. N_VDestroy_Serial
). All the standard vector operations
listed in §9.2.1 with the suffix _Serial
appended are callable via the Fortran 2003 interface by prepending an
F
(e.g. FN_VDestroy_Serial
).
The module NVECTOR_SERIAL provides the following additional usercallable routines:

N_Vector N_VNew_Serial(sunindextype vec_length, SUNContext sunctx)
This function creates and allocates memory for a serial
N_Vector
. Its only argument is the vector length.

N_Vector N_VNewEmpty_Serial(sunindextype vec_length, SUNContext sunctx)
This function creates a new serial
N_Vector
with an empty (NULL
) data array.

N_Vector N_VMake_Serial(sunindextype vec_length, realtype *v_data, SUNContext sunctx)
This function creates and allocates memory for a serial vector with userprovided data array, v_data.
(This function does not allocate memory for
v_data
itself.)

void N_VPrintFile_Serial(N_Vector v, FILE *outfile)
This function prints the content of a serial vector to
outfile
.
By default all fused and vector array operations are disabled in the NVECTOR_SERIAL
module. The following additional usercallable routines are provided to
enable or disable fused and vector array operations for a specific vector. To
ensure consistency across vectors it is recommended to first create a vector
with N_VNew_Serial()
, enable/disable the desired operations for that vector
with the functions below, and create any additional vectors from that vector
using N_VClone()
. This guarantees that the new vectors will have the same
operations enabled/disabled as cloned vectors inherit the same enable/disable
options as the vector they are cloned, from while vectors created with
N_VNew_Serial()
will have the default settings for the NVECTOR_SERIAL module.

int N_VEnableFusedOps_Serial(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the serial vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_Serial(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the serial vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_Serial(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the serial vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableDotProdMulti_Serial(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the multiple dot products fused operation in the serial vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_Serial(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the serial vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_Serial(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the serial vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_Serial(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the serial vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormVectorArray_Serial(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the WRMS norm operation for vector arrays in the serial vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormMaskVectorArray_Serial(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the masked WRMS norm operation for vector arrays in the serial vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMultiVectorArray_Serial(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector array to multiple vector arrays operation in the serial vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombinationVectorArray_Serial(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination operation for vector arrays in the serial vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
When looping over the components of an
N_Vector v
, it is more efficient to first obtain the component array viav_data = NV_DATA_S(v)
, or equivalentlyv_data = N_VGetArrayPointer(v)
, and then accessv_data[i]
within the loop than it is to useNV_Ith_S(v,i)
within the loop.N_VNewEmpty_Serial()
,N_VMake_Serial()
, andN_VCloneVectorArrayEmpty_Serial()
set the field own_data toSUNFALSE
. The functionsN_VDestroy_Serial()
andN_VDestroyVectorArray_Serial()
will not attempt to free the pointer data for anyN_Vector
with own_data set toSUNFALSE
. In such a case, it is the user’s responsibility to deallocate the data pointer.To maximize efficiency, vector operations in the NVECTOR_SERIAL implementation that have more than one
N_Vector
argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same length.
9.9.3. NVECTOR_SERIAL Fortran Interface
The NVECTOR_SERIAL module provides a Fortran 2003 module for use from Fortran applications.
The fnvector_serial_mod
Fortran module defines interfaces to all
NVECTOR_SERIAL C functions using the intrinsic iso_c_binding
module which provides a standardized mechanism for interoperating with C. As
noted in the C function descriptions above, the interface functions are
named after the corresponding C function, but with a leading F
. For
example, the function N_VNew_Serial
is interfaced as
FN_VNew_Serial
.
The Fortran 2003 NVECTOR_SERIAL interface module can be accessed with the use
statement, i.e. use fnvector_serial_mod
, and linking to the library
libsundials_fnvectorserial_mod.lib
in addition to the C library.
For details on where the library and module file
fnvector_serial_mod.mod
are installed see §14.
We note that the module is accessible from the Fortran 2003 SUNDIALS integrators
without separately linking to the libsundials_fnvectorserial_mod
library.
9.10. The NVECTOR_PARALLEL Module
The NVECTOR_PARALLEL implementation of the NVECTOR module provided with
SUNDIALS is based on MPI. It defines the content field of an
N_Vector
to be a structure containing the global and local lengths
of the vector, a pointer to the beginning of a contiguous local data
array, an MPI communicator, an a boolean flag own_data indicating
ownership of the data array data.
struct _N_VectorContent_Parallel {
sunindextype local_length;
sunindextype global_length;
booleantype own_data;
realtype *data;
MPI_Comm comm;
};
The header file to be included when using this module is
nvector_parallel.h
. The installed module library to link to is
libsundials_nvecparallel.lib
where .lib
is typically .so
for shared libraries and .a
for static libraries.
9.10.1. NVECTOR_PARALLEL accessor macros
The following seven macros are provided to access the content of a
NVECTOR_PARALLEL vector. The suffix _P
in the names denotes the
distributed memory parallel version.

NV_CONTENT_P(v)
This macro gives access to the contents of the parallel
N_Vector
v.The assignment
v_cont = NV_CONTENT_P(v)
setsv_cont
to be a pointer to theN_Vector
content structure of typestruct N_VectorContent_Parallel
.Implementation:
#define NV_CONTENT_P(v) ( (N_VectorContent_Parallel)(v>content) )

NV_OWN_DATA_P(v)
Access the own_data component of the parallel
N_Vector
v.Implementation:
#define NV_OWN_DATA_P(v) ( NV_CONTENT_P(v)>own_data )

NV_DATA_P(v)
The assignment
v_data = NV_DATA_P(v)
setsv_data
to be a pointer to the first component of the local_data for theN_Vector v
.The assignment
NV_DATA_P(v) = v_data
sets the component array ofv
to bev_data
by storing the pointerv_data
into data.Implementation:
#define NV_DATA_P(v) ( NV_CONTENT_P(v)>data )

NV_LOCLENGTH_P(v)
The assignment
v_llen = NV_LOCLENGTH_P(v)
setsv_llen
to be the length of the local part ofv
.The call
NV_LOCLENGTH_P(v) = llen_v
sets the local_length ofv
to bellen_v
.Implementation:
#define NV_LOCLENGTH_P(v) ( NV_CONTENT_P(v)>local_length )

NV_GLOBLENGTH_P(v)
The assignment
v_glen = NV_GLOBLENGTH_P(v)
setsv_glen
to be the global_length of the vectorv
.The call
NV_GLOBLENGTH_P(v) = glen_v
sets the global_length ofv
to beglen_v
.Implementation:
#define NV_GLOBLENGTH_P(v) ( NV_CONTENT_P(v)>global_length )

NV_COMM_P(v)
This macro provides access to the MPI communicator used by the parallel
N_Vector
v.Implementation:
#define NV_COMM_P(v) ( NV_CONTENT_P(v)>comm )

NV_Ith_P(v, i)
This macro gives access to the individual components of the local_data array of an
N_Vector
.The assignment
r = NV_Ith_P(v,i)
setsr
to be the value of thei
th component of the local part ofv
.The assignment
NV_Ith_P(v,i) = r
sets the value of thei
th component of the local part ofv
to ber
.Here
i
ranges from 0 to \(n1\), where \(n\) is the local_length.Implementation:
#define NV_Ith_P(v,i) ( NV_DATA_P(v)[i] )
9.10.2. NVECTOR_PARALLEL functions
The NVECTOR_PARALLEL module defines parallel implementations of all
vector operations listed in §9.2. Their names are
obtained from the generic names by appending the suffix _Parallel
(e.g. N_VDestroy_Parallel
). The module NVECTOR_PARALLEL provides
the following additional usercallable routines:

N_Vector N_VNew_Parallel(MPI_Comm comm, sunindextype local_length, sunindextype global_length, SUNContext sunctx)
This function creates and allocates memory for a parallel vector having global length global_length, having processorlocal length local_length, and using the MPI communicator comm.

N_Vector N_VNewEmpty_Parallel(MPI_Comm comm, sunindextype local_length, sunindextype global_length, SUNContext sunctx)
This function creates a new parallel
N_Vector
with an empty (NULL
) data array.

N_Vector N_VMake_Parallel(MPI_Comm comm, sunindextype local_length, sunindextype global_length, realtype *v_data, SUNContext sunctx)
This function creates and allocates memory for a parallel vector with userprovided data array.
(This function does not allocate memory for
v_data
itself.)

sunindextype N_VGetLocalLength_Parallel(N_Vector v)
This function returns the local vector length.

void N_VPrint_Parallel(N_Vector v)
This function prints the local content of a parallel vector to
stdout
.

void N_VPrintFile_Parallel(N_Vector v, FILE *outfile)
This function prints the local content of a parallel vector to
outfile
.
By default all fused and vector array operations are disabled in the NVECTOR_PARALLEL
module. The following additional usercallable routines are provided to
enable or disable fused and vector array operations for a specific vector. To
ensure consistency across vectors it is recommended to first create a vector
with N_VNew_Parallel()
, enable/disable the desired operations for that vector
with the functions below, and create any additional vectors from that vector
using N_VClone()
. This guarantees that the new vectors will have the same
operations enabled/disabled as cloned vectors inherit the same enable/disable
options as the vector they are cloned from, while vectors created with
N_VNew_Parallel()
will have the default settings for the NVECTOR_PARALLEL module.

int N_VEnableFusedOps_Parallel(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the parallel vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_Parallel(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the parallel vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_Parallel(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the parallel vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableDotProdMulti_Parallel(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the multiple dot products fused operation in the parallel vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_Parallel(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the parallel vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_Parallel(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the parallel vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_Parallel(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the parallel vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormVectorArray_Parallel(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the WRMS norm operation for vector arrays in the parallel vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormMaskVectorArray_Parallel(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the masked WRMS norm operation for vector arrays in the parallel vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMultiVectorArray_Parallel(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector array to multiple vector arrays operation in the parallel vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombinationVectorArray_Parallel(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination operation for vector arrays in the parallel vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
When looping over the components of an
N_Vector v
, it is more efficient to first obtain the local component array viav_data = N_VGetArrayPointer(v)
, or equivalentlyv_data = NV_DATA_P(v)
, and then accessv_data[i]
within the loop than it is to useNV_Ith_P(v,i)
within the loop.N_VNewEmpty_Parallel()
,N_VMake_Parallel()
, andN_VCloneVectorArrayEmpty_Parallel()
set the field own_data toSUNFALSE
. The routinesN_VDestroy_Parallel()
andN_VDestroyVectorArray_Parallel()
will not attempt to free the pointer data for anyN_Vector
with own_data set toSUNFALSE
. In such a case, it is the user’s responsibility to deallocate the data pointer.To maximize efficiency, vector operations in the NVECTOR_PARALLEL implementation that have more than one
N_Vector
argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same internal representations.
9.10.3. NVECTOR_PARALLEL Fortran Interface
The NVECTOR_PARALLEL module provides a Fortran 2003 module for use from Fortran applications.
The fnvector_parallel_mod
Fortran module defines interfaces to all
NVECTOR_PARALLEL C functions using the intrinsic iso_c_binding
module which provides a standardized mechanism for interoperating with C. As
noted in the C function descriptions above, the interface functions are
named after the corresponding C function, but with a leading F
. For
example, the function N_VNew_Parallel
is interfaced as
FN_VNew_Parallel
.
The Fortran 2003 NVECTOR_PARALLEL interface module can be accessed with the use
statement, i.e. use fnvector_parallel_mod
, and linking to the library
libsundials_fnvectorparallel_mod.lib
in addition to the C library.
For details on where the library and module file
fnvector_parallel_mod.mod
are installed see §14.
We note that the module is accessible from the Fortran 2003 SUNDIALS integrators
without separately linking to the libsundials_fnvectorparallel_mod
library.
9.11. The NVECTOR_OPENMP Module
In situations where a user has a multicore processing unit capable of running multiple parallel threads with shared memory, SUNDIALS provides an implementation of NVECTOR using OpenMP, called NVECTOR_OPENMP, and an implementation using Pthreads, called NVECTOR_PTHREADS. Testing has shown that vectors should be of length at least \(100,000\) before the overhead associated with creating and using the threads is made up by the parallelism in the vector calculations.
The OpenMP NVECTOR implementation provided with SUNDIALS,
NVECTOR_OPENMP, defines the content field of N_Vector
to be a structure
containing the length of the vector, a pointer to the beginning of a contiguous
data array, a boolean flag own_data which specifies the ownership of
data, and the number of threads. Operations on the vector are
threaded using OpenMP, the number of threads used is based on the
supplied argument in the vector constructor.
struct _N_VectorContent_OpenMP {
sunindextype length;
booleantype own_data;
realtype *data;
int num_threads;
};
The header file to be included when using this module is nvector_openmp.h
.
The installed module library to link to is
libsundials_nvecopenmp.lib
where .lib
is typically .so
for shared libraries and .a
for static libraries.
The Fortran module file to use when using the Fortran 2003 interface to
this module is fnvector_openmp_mod.mod
.
9.11.1. NVECTOR_OPENMP accessor macros
The following six macros are provided to access the content of an NVECTOR_OPENMP
vector. The suffix _OMP
in the names denotes the OpenMP version.

NV_CONTENT_OMP(v)
This macro gives access to the contents of the OpenMP vector
N_Vector
v.The assignment
v_cont = NV_CONTENT_OMP(v)
setsv_cont
to be a pointer to the OpenMPN_Vector
content structure.Implementation:
#define NV_CONTENT_OMP(v) ( (N_VectorContent_OpenMP)(v>content) )

NV_OWN_DATA_OMP(v)
Access the own_data component of the OpenMP
N_Vector
v.Implementation:
#define NV_OWN_DATA_OMP(v) ( NV_CONTENT_OMP(v)>own_data )

NV_DATA_OMP(v)
The assignment
v_data = NV_DATA_OMP(v)
setsv_data
to be a pointer to the first component of the data for theN_Vector
v
.Similarly, the assignment
NV_DATA_OMP(v) = v_data
sets the component array ofv
to bev_data
by storing the pointerv_data
.Implementation:
#define NV_DATA_OMP(v) ( NV_CONTENT_OMP(v)>data )

NV_LENGTH_OMP(v)
Access the length component of the OpenMP
N_Vector
v.The assignment
v_len = NV_LENGTH_OMP(v)
setsv_len
to be the length ofv
. On the other hand, the callNV_LENGTH_OMP(v) = len_v
sets the length ofv
to belen_v
.Implementation:
#define NV_LENGTH_OMP(v) ( NV_CONTENT_OMP(v)>length )

NV_NUM_THREADS_OMP(v)
Access the num_threads component of the OpenMP
N_Vector
v.The assignment
v_threads = NV_NUM_THREADS_OMP(v)
setsv_threads
to be the num_threads ofv
. On the other hand, the callNV_NUM_THREADS_OMP(v) = num_threads_v
sets the num_threads ofv
to benum_threads_v
.Implementation:
#define NV_NUM_THREADS_OMP(v) ( NV_CONTENT_OMP(v)>num_threads )

NV_Ith_OMP(v, i)
This macro gives access to the individual components of the data array of an
N_Vector
, using standard 0based C indexing.The assignment
r = NV_Ith_OMP(v,i)
setsr
to be the value of thei
th component ofv
.The assignment
NV_Ith_OMP(v,i) = r
sets the value of thei
th component ofv
to ber
.Here
i
ranges from 0 to \(n1\) for a vector of length \(n\).Implementation:
#define NV_Ith_OMP(v,i) ( NV_DATA_OMP(v)[i] )
9.11.2. NVECTOR_OPENMP functions
The NVECTOR_OPENMP module defines OpenMP implementations of all vector
operations listed in §9.2,
§9.2.2, §9.2.3, and
§9.2.4. Their names are obtained from those in
those sections by appending the suffix _OpenMP
(e.g. N_VDestroy_OpenMP
). All the standard vector operations
listed in §9.2 with the suffix _OpenMP
appended are callable via the Fortran 2003 interface by prepending an
F’ (e.g. ``FN_VDestroy_OpenMP`).
The module NVECTOR_OPENMP provides the following additional usercallable routines:

N_Vector N_VNew_OpenMP(sunindextype vec_length, int num_threads, SUNContext sunctx)
This function creates and allocates memory for a OpenMP
N_Vector
. Arguments are the vector length and number of threads.

N_Vector N_VNewEmpty_OpenMP(sunindextype vec_length, int num_threads, SUNContext sunctx)
This function creates a new OpenMP
N_Vector
with an empty (NULL
) data array.

N_Vector N_VMake_OpenMP(sunindextype vec_length, realtype *v_data, int num_threads, SUNContext sunctx)
This function creates and allocates memory for a OpenMP vector with userprovided data array, v_data.
(This function does not allocate memory for
v_data
itself.)

void N_VPrintFile_OpenMP(N_Vector v, FILE *outfile)
This function prints the content of an OpenMP vector to
outfile
.
By default all fused and vector array operations are disabled in the NVECTOR_OPENMP
module. The following additional usercallable routines are provided to
enable or disable fused and vector array operations for a specific vector. To
ensure consistency across vectors it is recommended to first create a vector
with N_VNew_OpenMP()
, enable/disable the desired operations for that vector
with the functions below, and create any additional vectors from that vector
using N_VClone()
. This guarantees the new vectors will have the same
operations enabled/disabled as cloned vectors inherit the same enable/disable
options as the vector they are cloned from while vectors created with
N_VNew_OpenMP()
will have the default settings for the NVECTOR_OPENMP module.

int N_VEnableFusedOps_OpenMP(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the OpenMP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_OpenMP(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the OpenMP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_OpenMP(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the OpenMP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableDotProdMulti_OpenMP(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the multiple dot products fused operation in the OpenMP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_OpenMP(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the OpenMP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_OpenMP(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the OpenMP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_OpenMP(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the OpenMP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormVectorArray_OpenMP(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the WRMS norm operation for vector arrays in the OpenMP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormMaskVectorArray_OpenMP(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the masked WRMS norm operation for vector arrays in the OpenMP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMultiVectorArray_OpenMP(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector array to multiple vector arrays operation in the OpenMP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombinationVectorArray_OpenMP(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination operation for vector arrays in the OpenMP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
When looping over the components of an
N_Vector v
, it is more efficient to first obtain the component array viav_data = N_VGetArrayPointer(v)
, or equivalentlyv_data = NV_DATA_OMP(v)
and then accessv_data[i]
within the loop than it is to useNV_Ith_OMP(v,i)
within the loop.N_VNewEmpty_OpenMP()
,N_VMake_OpenMP()
, andN_VCloneVectorArrayEmpty_OpenMP()
set the field own_data toSUNFALSE
. The functionsN_VDestroy_OpenMP()
andN_VDestroyVectorArray_OpenMP()
will not attempt to free the pointer data for anyN_Vector
with own_data set toSUNFALSE
. In such a case, it is the user’s responsibility to deallocate the data pointer.To maximize efficiency, vector operations in the NVECTOR_OPENMP implementation that have more than one
N_Vector
argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same internal representations.
9.11.3. NVECTOR_OPENMP Fortran Interface
The NVECTOR_OPENMP module provides a Fortran 2003 module for use from Fortran applications.
The fnvector_openmp_mod
Fortran module defines interfaces to all
NVECTOR_OPENMP C functions using the intrinsic iso_c_binding
module which provides a standardized mechanism for interoperating with C. As
noted in the C function descriptions above, the interface functions are
named after the corresponding C function, but with a leading F
. For
example, the function N_VNew_OpenMP
is interfaced as
FN_VNew_OpenMP
.
The Fortran 2003 NVECTOR_OPENMP interface module can be accessed with the use
statement, i.e. use fnvector_openmp_mod
, and linking to the library
libsundials_fnvectoropenmp_mod.lib
in addition to the C library.
For details on where the library and module file
fnvector_openmp_mod.mod
are installed see §14.
9.12. The NVECTOR_PTHREADS Module
In situations where a user has a multicore processing unit capable of running multiple parallel threads with shared memory, SUNDIALS provides an implementation of NVECTOR using OpenMP, called NVECTOR_OPENMP, and an implementation using Pthreads, called NVECTOR_PTHREADS. Testing has shown that vectors should be of length at least \(100,000\) before the overhead associated with creating and using the threads is made up by the parallelism in the vector calculations.
The Pthreads NVECTOR implementation provided with SUNDIALS, denoted
NVECTOR_PTHREADS, defines the content field of N_Vector
to be a structure
containing the length of the vector, a pointer to the beginning of a contiguous
data array, a boolean flag own_data which specifies the ownership
of data, and the number of threads. Operations on the vector are
threaded using POSIX threads (Pthreads).
struct _N_VectorContent_Pthreads {
sunindextype length;
booleantype own_data;
realtype *data;
int num_threads;
};
The header file to be included when using this module is nvector_pthreads.h
.
The installed module library to link to is
libsundials_nvecpthreads.lib
where .lib
is typically .so
for shared libraries and .a
for static libraries.
9.12.1. NVECTOR_PTHREADS accessor macros
The following six macros are provided to access the content of an NVECTOR_PTHREADS
vector. The suffix _PT
in the names denotes the Pthreads version.

NV_CONTENT_PT(v)
This macro gives access to the contents of the Pthreads vector
N_Vector
v.The assignment
v_cont = NV_CONTENT_PT(v)
setsv_cont
to be a pointer to the PthreadsN_Vector
content structure.Implementation:
#define NV_CONTENT_PT(v) ( (N_VectorContent_Pthreads)(v>content) )

NV_OWN_DATA_PT(v)
Access the own_data component of the Pthreads
N_Vector
v.Implementation:
#define NV_OWN_DATA_PT(v) ( NV_CONTENT_PT(v)>own_data )

NV_DATA_PT(v)
The assignment
v_data = NV_DATA_PT(v)
setsv_data
to be a pointer to the first component of the data for theN_Vector
v
.Similarly, the assignment
NV_DATA_PT(v) = v_data
sets the component array ofv
to bev_data
by storing the pointerv_data
.Implementation:
#define NV_DATA_PT(v) ( NV_CONTENT_PT(v)>data )

NV_LENGTH_PT(v)
Access the length component of the Pthreads
N_Vector
v.The assignment
v_len = NV_LENGTH_PT(v)
setsv_len
to be the length ofv
. On the other hand, the callNV_LENGTH_PT(v) = len_v
sets the length ofv
to belen_v
.Implementation:
#define NV_LENGTH_PT(v) ( NV_CONTENT_PT(v)>length )

NV_NUM_THREADS_PT(v)
Access the num_threads component of the Pthreads
N_Vector
v.The assignment
v_threads = NV_NUM_THREADS_PT(v)
setsv_threads
to be the num_threads ofv
. On the other hand, the callNV_NUM_THREADS_PT(v) = num_threads_v
sets the num_threads ofv
to benum_threads_v
.Implementation:
#define NV_NUM_THREADS_PT(v) ( NV_CONTENT_PT(v)>num_threads )

NV_Ith_PT(v, i)
This macro gives access to the individual components of the data array of an
N_Vector
, using standard 0based C indexing.The assignment
r = NV_Ith_PT(v,i)
setsr
to be the value of thei
th component ofv
.The assignment
NV_Ith_PT(v,i) = r
sets the value of thei
th component ofv
to ber
.Here
i
ranges from 0 to \(n1\) for a vector of length \(n\).Implementation:
#define NV_Ith_PT(v,i) ( NV_DATA_PT(v)[i] )
9.12.2. NVECTOR_PTHREADS functions
The NVECTOR_PTHREADS module defines Pthreads implementations of all vector
operations listed in §9.2,
§9.2.2, §9.2.3, and
§9.2.4. Their names
are obtained from those in those sections by appending the suffix
_Pthreads
(e.g. N_VDestroy_Pthreads). All the standard vector
operations listed in §9.2 are callable via
the Fortran 2003 interface by prepending an F’ (e.g. ``FN_VDestroy_Pthreads`).
The module NVECTOR_PTHREADS provides the following additional
usercallable routines:

N_Vector N_VNew_Pthreads(sunindextype vec_length, int num_threads, SUNContext sunctx)
This function creates and allocates memory for a Pthreads
N_Vector
. Arguments are the vector length and number of threads.

N_Vector N_VNewEmpty_Pthreads(sunindextype vec_length, int num_threads, SUNContext sunctx)
This function creates a new Pthreads
N_Vector
with an empty (NULL
) data array.

N_Vector N_VMake_Pthreads(sunindextype vec_length, realtype *v_data, int num_threads, SUNContext sunctx)
This function creates and allocates memory for a Pthreads vector with userprovided data array, v_data.
(This function does not allocate memory for
v_data
itself.)

void N_VPrint_Pthreads(N_Vector v)
This function prints the content of a Pthreads vector to
stdout
.

void N_VPrintFile_Pthreads(N_Vector v, FILE *outfile)
This function prints the content of a Pthreads vector to
outfile
.
By default all fused and vector array operations are disabled in the NVECTOR_PTHREADS
module. The following additional usercallable routines are provided to
enable or disable fused and vector array operations for a specific vector. To
ensure consistency across vectors it is recommended to first create a vector
with N_VNew_Pthreads()
, enable/disable the desired operations for that vector
with the functions below, and create any additional vectors from that vector
using N_VClone()
. This guarantees the new vectors will have the same
operations enabled/disabled as cloned vectors inherit the same enable/disable
options as the vector they are cloned from while vectors created with
N_VNew_Pthreads()
will have the default settings for the NVECTOR_PTHREADS module.

int N_VEnableFusedOps_Pthreads(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the Pthreads vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_Pthreads(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the Pthreads vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_Pthreads(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the Pthreads vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableDotProdMulti_Pthreads(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the multiple dot products fused operation in the Pthreads vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_Pthreads(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the Pthreads vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_Pthreads(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the Pthreads vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_Pthreads(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the Pthreads vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormVectorArray_Pthreads(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the WRMS norm operation for vector arrays in the Pthreads vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormMaskVectorArray_Pthreads(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the masked WRMS norm operation for vector arrays in the Pthreads vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMultiVectorArray_Pthreads(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector array to multiple vector arrays operation in the Pthreads vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombinationVectorArray_Pthreads(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination operation for vector arrays in the Pthreads vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
When looping over the components of an
N_Vector v
, it is more efficient to first obtain the component array viav_data = N_VGetArrayPointer(v)
, or equivalentlyv_data = NV_DATA_PT(v)
and then accessv_data[i]
within the loop than it is to useNV_Ith_S(v,i)
within the loop.N_VNewEmpty_Pthreads()
,N_VMake_Pthreads()
, andN_VCloneVectorArrayEmpty_Pthreads()
set the field own_data toSUNFALSE
. The functionsN_VDestroy_Pthreads()
andN_VDestroyVectorArray_Pthreads()
will not attempt to free the pointer data for anyN_Vector
with own_data set toSUNFALSE
. In such a case, it is the user’s responsibility to deallocate the data pointer.To maximize efficiency, vector operations in the NVECTOR_PTHREADS implementation that have more than one
N_Vector
argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same internal representations.
9.12.3. NVECTOR_PTHREADS Fortran Interface
The NVECTOR_PTHREADS module provides a Fortran 2003 module for use from Fortran applications.
The fnvector_pthreads_mod
Fortran module defines interfaces to all
NVECTOR_PTHREADS C functions using the intrinsic iso_c_binding
module which provides a standardized mechanism for interoperating with C. As
noted in the C function descriptions above, the interface functions are
named after the corresponding C function, but with a leading F
. For
example, the function N_VNew_Pthreads
is interfaced as
FN_VNew_Pthreads
.
The Fortran 2003 NVECTOR_PTHREADS interface module can be accessed with the use
statement, i.e. use fnvector_pthreads_mod
, and linking to the library
libsundials_fnvectorpthreads_mod.lib
in addition to the C library.
For details on where the library and module file
fnvector_pthreads_mod.mod
are installed see §14.
9.13. The NVECTOR_PARHYP Module
The NVECTOR_PARHYP implementation of the NVECTOR module provided with
SUNDIALS is a wrapper around HYPRE’s ParVector class.
Most of the vector kernels simply call HYPRE vector operations.
The implementation defines the content field of N_Vector
to
be a structure containing the global and local lengths of the vector, a
pointer to an object of type hypre_ParVector
, an MPI communicator,
and a boolean flag own_parvector indicating ownership of the
HYPRE parallel vector object x.
struct _N_VectorContent_ParHyp {
sunindextype local_length;
sunindextype global_length;
booleantype own_data;
booleantype own_parvector;
realtype *data;
MPI_Comm comm;
hypre_ParVector *x;
};
The header file to be included when using this module is nvector_parhyp.h
.
The installed module library to link to is
libsundials_nvecparhyp.lib
where .lib
is typically .so
for
shared libraries and .a
for static libraries.
Unlike native SUNDIALS vector types, NVECTOR_PARHYP does not provide macros to access its member variables. Note that NVECTOR_PARHYP requires SUNDIALS to be built with MPI support.
9.13.1. NVECTOR_PARHYP functions
The NVECTOR_PARHYP module defines implementations of all vector operations
listed in §9.2 except for N_VSetArrayPointer()
and
N_VGetArrayPointer()
because accessing raw vector data is handled by
lowlevel HYPRE functions. As such, this vector is not available for use with
SUNDIALS Fortran interfaces. When access to raw vector data is needed, one
should extract the HYPRE vector first, and then use HYPRE methods to access the
data. Usage examples of NVECTOR_PARHYP are provided in the
cvAdvDiff_non_ph.c
example programs for CVODE and the
ark_diurnal_kry_ph.c
example program for ARKODE.
The names of parhyp methods are obtained from those in
§9.2, §9.2.2, §9.2.3, and
§9.2.4 by appending the suffix _ParHyp
(e.g. N_VDestroy_ParHyp
). The module NVECTOR_PARHYP provides the
following additional usercallable routines:

N_Vector N_VNewEmpty_ParHyp(MPI_Comm comm, sunindextype local_length, sunindextype global_length, SUNContext sunctx)
This function creates a new parhyp
N_Vector
with the pointer to the HYPRE vector set toNULL
.

N_Vector N_VMake_ParHyp(hypre_ParVector *x, SUNContext sunctx)
This function creates an
N_Vector
wrapper around an existing HYPRE parallel vector. It does not allocate memory forx
itself.

hypre_ParVector *N_VGetVector_ParHyp(N_Vector v)
This function returns a pointer to the underlying HYPRE vector.

void N_VPrint_ParHyp(N_Vector v)
This function prints the local content of a parhyp vector to
stdout
.

void N_VPrintFile_ParHyp(N_Vector v, FILE *outfile)
This function prints the local content of a parhyp vector to
outfile
.
By default all fused and vector array operations are disabled in the NVECTOR_PARHYP
module. The following additional usercallable routines are provided to
enable or disable fused and vector array operations for a specific vector. To
ensure consistency across vectors it is recommended to first create a vector
with N_VMake_ParHyp()
, enable/disable the desired operations for that vector
with the functions below, and create any additional vectors from that vector
using N_VClone()
. This guarantees the new vectors will have the same
operations enabled/disabled as cloned vectors inherit the same enable/disable
options as the vector they are cloned from while vectors created with
N_VMake_ParHyp()
will have the default settings for the NVECTOR_PARHYP module.

int N_VEnableFusedOps_ParHyp(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the parhyp vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_ParHyp(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the parhyp vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_ParHyp(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the parhyp vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableDotProdMulti_ParHyp(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the multiple dot products fused operation in the parhyp vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_ParHyp(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the parhyp vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_ParHyp(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the parhyp vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_ParHyp(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the parhyp vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormVectorArray_ParHyp(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the WRMS norm operation for vector arrays in the parhyp vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormMaskVectorArray_ParHyp(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the masked WRMS norm operation for vector arrays in the parhyp vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMultiVectorArray_ParHyp(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector array to multiple vector arrays operation in the parhyp vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombinationVectorArray_ParHyp(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination operation for vector arrays in the parhyp vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
When there is a need to access components of an
N_Vector_ParHyp v
, it is recommended to extract the HYPRE vector viax_vec = N_VGetVector_ParHyp(v)
and then access components using appropriate HYPRE functions.N_VNewEmpty_ParHyp()
,N_VMake_ParHyp()
, andN_VCloneVectorArrayEmpty_ParHyp()
set the field own_parvector toSUNFALSE
. The functionsN_VDestroy_ParHyp()
andN_VDestroyVectorArray_ParHyp()
will not attempt to delete an underlying HYPRE vector for anyN_Vector
with own_parvector set toSUNFALSE
. In such a case, it is the user’s responsibility to delete the underlying vector.To maximize efficiency, vector operations in the NVECTOR_PARHYP implementation that have more than one
N_Vector
argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same internal representations.
9.14. The NVECTOR_PETSC Module
The NVECTOR_PETSC module is an NVECTOR wrapper around the PETSc vector. It
defines the content field of a N_Vector
to be a structure
containing the global and local lengths of the vector, a pointer to
the PETSc vector, an MPI communicator, and a boolean flag own_data
indicating ownership of the wrapped PETSc vector.
struct _N_VectorContent_Petsc {
sunindextype local_length;
sunindextype global_length;
booleantype own_data;
Vec *pvec;
MPI_Comm comm;
};
The header file to be included when using this module is
nvector_petsc.h
. The installed module library to link to is
libsundials_nvecpetsc.lib
where .lib
is typically .so
for
shared libraries and .a
for static libraries.
Unlike native SUNDIALS vector types, NVECTOR_PETSC does not provide macros to access its member variables. Note that NVECTOR_PETSC requires SUNDIALS to be built with MPI support.
9.14.1. NVECTOR_PETSC functions
The NVECTOR_PETSC module defines implementations of all vector operations listed
in §9.2 except for N_VGetArrayPointer()
and
N_VSetArrayPointer()
. As such, this vector cannot be used with SUNDIALS
Fortran interfaces. When access to raw vector data is needed, it is recommended
to extract the PETSc vector first, and then use PETSc methods to access the
data. Usage examples of NVECTOR_PETSC is provided in example programs for IDA.
The names of vector operations are obtained from those in
§9.2, §9.2.2, §9.2.3, and
§9.2.4 by appending the suffice _Petsc
(e.g. N_VDestroy_Petsc
). The module NVECTOR_PETSC provides the
following additional usercallable routines:

N_Vector N_VNewEmpty_Petsc(MPI_Comm comm, sunindextype local_length, sunindextype global_length, SUNContext sunctx)
This function creates a new PETSC
N_Vector
with the pointer to the wrapped PETSc vector set toNULL
. It is used by theN_VMake_Petsc
andN_VClone_Petsc
implementations. It should be used only with great caution.

N_Vector N_VMake_Petsc(Vec *pvec, SUNContext sunctx)
This function creates and allocates memory for an NVECTOR_PETSC wrapper with a userprovided PETSc vector. It does not allocate memory for the vector
pvec
itself.

Vec *N_VGetVector_Petsc(N_Vector v)
This function returns a pointer to the underlying PETSc vector.

void N_VPrint_Petsc(N_Vector v)
This function prints the global content of a wrapped PETSc vector to
stdout
.

void N_VPrintFile_Petsc(N_Vector v, const char fname[])
This function prints the global content of a wrapped PETSc vector to
fname
.
By default all fused and vector array operations are disabled in the NVECTOR_PETSC
module. The following additional usercallable routines are provided to
enable or disable fused and vector array operations for a specific vector. To
ensure consistency across vectors it is recommended to first create a vector
with N_VMake_Petsc()
, enable/disable the desired operations for that vector
with the functions below, and create any additional vectors from that vector
using N_VClone()
. This guarantees the new vectors will have the same
operations enabled/disabled as cloned vectors inherit the same enable/disable
options as the vector they are cloned from while vectors created with
N_VMake_Petsc()
will have the default settings for the NVECTOR_PETSC module.

int N_VEnableFusedOps_Petsc(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the PETSc vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_Petsc(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the PETSc vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_Petsc(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the PETSc vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableDotProdMulti_Petsc(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the multiple dot products fused operation in the PETSc vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_Petsc(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the PETSc vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_Petsc(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the PETSc vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_Petsc(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the PETSc vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormVectorArray_Petsc(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the WRMS norm operation for vector arrays in the PETSc vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormMaskVectorArray_Petsc(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the masked WRMS norm operation for vector arrays in the PETSc vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMultiVectorArray_Petsc(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector array to multiple vector arrays operation in the PETSc vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombinationVectorArray_Petsc(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination operation for vector arrays in the PETSc vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
When there is a need to access components of an
N_Vector_Petsc v
, it is recommeded to extract the PETSc vector viax_vec = N_VGetVector_Petsc(v);
and then access components using appropriate PETSc functions.The functions
N_VNewEmpty_Petsc()
,N_VMake_Petsc()
, andN_VCloneVectorArrayEmpty_Petsc()
set the field own_data toSUNFALSE
. The routinesN_VDestroy_Petsc()
andN_VDestroyVectorArray_Petsc()
will not attempt to free the pointerpvec
for anyN_Vector
with own_data set toSUNFALSE
. In such a case, it is the user’s responsibility to deallocate thepvec
pointer.To maximize efficiency, vector operations in the NVECTOR_PETSC implementation that have more than one
N_Vector
argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same internal representations.
9.15. The NVECTOR_CUDA Module
The NVECTOR_CUDA module is an NVECTOR implementation in the CUDA language. The module allows for SUNDIALS vector kernels to run on NVIDIA GPU devices. It is intended for users who are already familiar with CUDA and GPU programming. Building this vector module requires a CUDA compiler and, by extension, a C++ compiler. The vector content layout is as follows:
struct _N_VectorContent_Cuda
{
sunindextype length;
booleantype own_helper;
SUNMemory host_data;
SUNMemory device_data;
SUNCudaExecPolicy* stream_exec_policy;
SUNCudaExecPolicy* reduce_exec_policy;
SUNMemoryHelper mem_helper;
void* priv; /* 'private' data */
};
typedef struct _N_VectorContent_Cuda *N_VectorContent_Cuda;
The content members are the vector length (size), boolean flags that indicate
if the vector owns the execution policies and memory helper objects (i.e., it is
in change of freeing the objects), SUNMemory
objects for the vector data on
the host and device, pointers to execution policies that control how streaming
and reduction kernels are launched, a SUNMemoryHelper
for performing memory
operations, and a private data structure which holds additonal members that
should not be accessed directly.
When instantiated with N_VNew_Cuda()
, the underlying data will be
allocated on both the host and the device. Alternatively, a user can provide
host and device data arrays by using the N_VMake_Cuda()
constructor.
To use CUDA managed memory, the constructors N_VNewManaged_Cuda()
and
N_VMakeManaged_Cuda()
are provided. Additionally, a userdefined
SUNMemoryHelper
for allocating/freeing data can be provided with the
constructor N_VNewWithMemHelp_Cuda()
. Details on each of these
constructors are provided below.
To use the NVECTOR_CUDA module, include nvector_cuda.h
and link to
the library libsundials_nveccuda.lib
. The extension, .lib
, is
typically .so
for shared libraries and .a
for static libraries.
9.15.1. NVECTOR_CUDA functions
Unlike other native SUNDIALS vector types, the NVECTOR_CUDA module does not provide macros to access its member variables. Instead, user should use the accessor functions:

realtype *N_VGetHostArrayPointer_Cuda(N_Vector v)
This function returns pointer to the vector data on the host.

realtype *N_VGetDeviceArrayPointer_Cuda(N_Vector v)
This function returns pointer to the vector data on the device.

booleantype N_VIsManagedMemory_Cuda(N_Vector v)
This function returns a boolean flag indiciating if the vector data array is in managed memory or not.
The NVECTOR_CUDA module defines implementations of all standard vector
operations defined in §9.2, §9.2.2,
§9.2.3, and §9.2.4, except for
N_VSetArrayPointer()
, and, if using unmanaged memory,
N_VGetArrayPointer()
. As such, this vector can only be used with
SUNDIALS direct solvers and preconditioners when using managed memory.
The NVECTOR_CUDA module provides separate functions to access data on the host
and on the device for the unmanaged memory use case. It also provides methods for
copying from the host to the device and vice versa. Usage examples of NVECTOR_CUDA
are provided in example programs for CVODE [66].
The names of vector operations are obtained from those in
§9.2, §9.2.2, §9.2.3, and
§9.2.4 by appending the suffix _Cuda
(e.g. N_VDestroy_Cuda
). The module NVECTOR_CUDA provides the
following additional usercallable routines:

N_Vector N_VNew_Cuda(sunindextype length, SUNContext sunctx)
This function creates and allocates memory for a CUDA
N_Vector
. The vector data array is allocated on both the host and device.

N_Vector N_VNewManaged_Cuda(sunindextype vec_length, SUNContext sunctx)
This function creates and allocates memory for a CUDA
N_Vector
. The vector data array is allocated in managed memory.

N_Vector N_VNewWithMemHelp_Cuda(sunindextype length, booleantype use_managed_mem, SUNMemoryHelper helper, SUNContext sunctx)
This function creates a new CUDA
N_Vector
with a usersupplied SUNMemoryHelper for allocating/freeing memory.

N_Vector N_VNewEmpty_Cuda(sunindextype vec_length, SUNContext sunctx)
This function creates a new CUDA
N_Vector
where the members of the content structure have not been allocated. This utility function is used by the other constructors to create a new vector.

N_Vector N_VMake_Cuda(sunindextype vec_length, realtype *h_vdata, realtype *d_vdata, SUNContext sunctx)
This function creates a CUDA
N_Vector
with usersupplied vector data arrays for the host and the device.

N_Vector N_VMakeManaged_Cuda(sunindextype vec_length, realtype *vdata, SUNContext sunctx)
This function creates a CUDA
N_Vector
with a usersupplied managed memory data array.

N_Vector N_VMakeWithManagedAllocator_Cuda(sunindextype length, void *(*allocfn)(size_t size), void (*freefn)(void *ptr))
This function creates a CUDA
N_Vector
with a usersupplied memory allocator. It requires the user to provide a corresponding free function as well. The memory allocated by the allocator function must behave like CUDA managed memory.
The module NVECTOR_CUDA also provides the following usercallable routines:

void N_VSetKernelExecPolicy_Cuda(N_Vector v, SUNCudaExecPolicy *stream_exec_policy, SUNCudaExecPolicy *reduce_exec_policy)
This function sets the execution policies which control the kernel parameters utilized when launching the streaming and reduction CUDA kernels. By default the vector is setup to use the
SUNCudaThreadDirectExecPolicy()
andSUNCudaBlockReduceAtomicExecPolicy()
. Any custom execution policy for reductions must ensure that the grid dimensions (number of thread blocks) is a multiple of the CUDA warp size (32). See §9.15.2 below for more information about theSUNCudaExecPolicy
class. ProvidingNULL
for an argument will result in the default policy being restored.The input execution policies are cloned and, as such, may be freed after being attached to the desired vectors. A
NULL
input policy will reset the execution policy to the default setting.Note
Note: All vectors used in a single instance of a SUNDIALS package must use the same execution policy. It is strongly recommended that this function is called immediately after constructing the vector, and any subsequent vector be created by cloning to ensure consistent execution policies across vectors

realtype *N_VCopyFromDevice_Cuda(N_Vector v)
This function copies vector data from the device to the host.

void N_VPrintFile_Cuda(N_Vector v, FILE *outfile)
This function prints the content of a CUDA vector to
outfile
.
By default all fused and vector array operations are disabled in the NVECTOR_CUDA
module. The following additional usercallable routines are provided to
enable or disable fused and vector array operations for a specific vector. To
ensure consistency across vectors it is recommended to first create a vector
with N_VNew_Cuda()
, enable/disable the desired operations for that vector
with the functions below, and create any additional vectors from that vector
using N_VClone()
. This guarantees the new vectors will have the same
operations enabled/disabled as cloned vectors inherit the same enable/disable
options as the vector they are cloned from while vectors created with
N_VNew_Cuda()
will have the default settings for the NVECTOR_CUDA module.

int N_VEnableFusedOps_Cuda(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the CUDA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_Cuda(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the CUDA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_Cuda(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the CUDA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableDotProdMulti_Cuda(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the multiple dot products fused operation in the CUDA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_Cuda(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the CUDA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_Cuda(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the CUDA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_Cuda(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the CUDA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormVectorArray_Cuda(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the WRMS norm operation for vector arrays in the CUDA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormMaskVectorArray_Cuda(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the masked WRMS norm operation for vector arrays in the CUDA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMultiVectorArray_Cuda(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector array to multiple vector arrays operation in the CUDA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombinationVectorArray_Cuda(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination operation for vector arrays in the CUDA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
When there is a need to access components of an
N_Vector_Cuda
,v
, it is recommeded to use functionsN_VGetDeviceArrayPointer_Cuda()
orN_VGetHostArrayPointer_Cuda()
. However, when using managed memory, the functionN_VGetArrayPointer()
may also be used.To maximize efficiency, vector operations in the NVECTOR_CUDA implementation that have more than one
N_Vector
argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same internal representations.
9.15.2. The SUNCudaExecPolicy
Class
In order to provide maximum flexibility to users, the CUDA kernel execution parameters used
by kernels within SUNDIALS are defined by objects of the sundials::cuda::ExecPolicy
abstract class type (this class can be accessed in the global namespace as SUNCudaExecPolicy
).
Thus, users may provide custom execution policies that fit the needs of their problem. The
SUNCudaExecPolicy
class is defined as

typedef sundials::cuda::ExecPolicy SUNCudaExecPolicy
where the sundials::cuda::ExecPolicy
class is defined in the header file
sundials_cuda_policies.hpp
, as follows:
class ExecPolicy
{
public:
ExecPolicy(cudaStream_t stream = 0) : stream_(stream) { }
virtual size_t gridSize(size_t numWorkUnits = 0, size_t blockDim = 0) const = 0;
virtual size_t blockSize(size_t numWorkUnits = 0, size_t gridDim = 0) const = 0;
virtual const cudaStream_t* stream() const { return (&stream_); }
virtual ExecPolicy* clone() const = 0;
ExecPolicy* clone_new_stream(cudaStream_t stream) const {
ExecPolicy* ex = clone();
ex>stream_ = stream;
return ex;
}
virtual bool atomic() const { return false; }
virtual ~ExecPolicy() {}
protected:
cudaStream_t stream_;
};
To define a custom execution policy, a user simply needs to create a class that
inherits from the abstract class and implements the methods. The SUNDIALS
provided sundials::cuda::ThreadDirectExecPolicy
(aka in the global namespace
as SUNCudaThreadDirectExecPolicy
) class is a good example of a what a custom
execution policy may look like:
class ThreadDirectExecPolicy : public ExecPolicy
{
public:
ThreadDirectExecPolicy(const size_t blockDim, cudaStream_t stream = 0)
: blockDim_(blockDim), ExecPolicy(stream)
{}
ThreadDirectExecPolicy(const ThreadDirectExecPolicy& ex)
: blockDim_(ex.blockDim_), ExecPolicy(ex.stream_)
{}
virtual size_t gridSize(size_t numWorkUnits = 0, size_t /*blockDim*/ = 0) const
{
/* ceil(n/m) = floor((n + m  1) / m) */
return (numWorkUnits + blockSize()  1) / blockSize();
}
virtual size_t blockSize(size_t /*numWorkUnits*/ = 0, size_t /*gridDim*/ = 0) const
{
return blockDim_;
}
virtual ExecPolicy* clone() const
{
return static_cast<ExecPolicy*>(new ThreadDirectExecPolicy(*this));
}
private:
const size_t blockDim_;
};
In total, SUNDIALS provides 3 execution policies:
 SUNCudaThreadDirectExecPolicy(const size_t blockDim, const cudaStream_t stream = 0)
Maps each CUDA thread to a work unit. The number of threads per block (blockDim) can be set to anything. The grid size will be calculated so that there are enough threads for one thread per element. If a CUDA stream is provided, it will be used to execute the kernel.
 SUNCudaGridStrideExecPolicy(const size_t blockDim, const size_t gridDim, const cudaStream_t stream = 0)
Is for kernels that use grid stride loops. The number of threads per block (blockDim) can be set to anything. The number of blocks (gridDim) can be set to anything. If a CUDA stream is provided, it will be used to execute the kernel.
 SUNCudaBlockReduceExecPolicy(const size_t blockDim, const cudaStream_t stream = 0)
Is for kernels performing a reduction across indvidual thread blocks. The number of threads per block (blockDim) can be set to any valid multiple of the CUDA warp size. The grid size (gridDim) can be set to any value greater than 0. If it is set to 0, then the grid size will be chosen so that there is enough threads for one thread per work unit. If a CUDA stream is provided, it will be used to execute the kernel.
 SUNCudaBlockReduceAtomicExecPolicy(const size_t blockDim, const cudaStream_t stream = 0)
Is for kernels performing a reduction across indvidual thread blocks using atomic operations. The number of threads per block (blockDim) can be set to any valid multiple of the CUDA warp size. The grid size (gridDim) can be set to any value greater than 0. If it is set to 0, then the grid size will be chosen so that there is enough threads for one thread per work unit. If a CUDA stream is provided, it will be used to execute the kernel.
For example, a policy that uses 128 threads per block and a user provided stream can be created like so:
cudaStream_t stream;
cudaStreamCreate(&stream);
SUNCudaThreadDirectExecPolicy thread_direct(128, stream);
These default policy objects can be reused for multiple SUNDIALS data structures
(e.g. a SUNMatrix
and an N_Vector
) since they do not hold any
modifiable state information.
9.16. The NVECTOR_HIP Module
The NVECTOR_HIP module is an NVECTOR implementation using the AMD ROCm HIP library [2]. The module allows for SUNDIALS vector kernels to run on AMD or NVIDIA GPU devices. It is intended for users who are already familiar with HIP and GPU programming. Building this vector module requires the HIPclang compiler. The vector content layout is as follows:
struct _N_VectorContent_Hip
{
sunindextype length;
booleantype own_helper;
SUNMemory host_data;
SUNMemory device_data;
SUNHipExecPolicy* stream_exec_policy;
SUNHipExecPolicy* reduce_exec_policy;
SUNMemoryHelper mem_helper;
void* priv; /* 'private' data */
};
typedef struct _N_VectorContent_Hip *N_VectorContent_Hip;
The content members are the vector length (size), a boolean flag that signals if
the vector owns the data (i.e. it is in charge of freeing the data), pointers to
vector data on the host and the device, pointers to SUNHipExecPolicy
implementations that control how the HIP kernels are launched for streaming and
reduction vector kernels, and a private data structure which holds additonal members
that should not be accessed directly.
When instantiated with N_VNew_Hip()
, the underlying data will be
allocated on both the host and the device. Alternatively, a user can provide
host and device data arrays by using the N_VMake_Hip()
constructor.
To use managed memory, the constructors N_VNewManaged_Hip()
and
N_VMakeManaged_Hip()
are provided. Additionally, a userdefined
SUNMemoryHelper
for allocating/freeing data can be provided with the
constructor N_VNewWithMemHelp_Hip()
. Details on each of these
constructors are provided below.
To use the NVECTOR_HIP module, include nvector_hip.h
and link to
the library libsundials_nvechip.lib
. The extension, .lib
, is
typically .so
for shared libraries and .a
for static libraries.
9.16.1. NVECTOR_HIP functions
Unlike other native SUNDIALS vector types, the NVECTOR_HIP module does not provide macros to access its member variables. Instead, user should use the accessor functions:

realtype *N_VGetHostArrayPointer_Hip(N_Vector v)
This function returns pointer to the vector data on the host.

realtype *N_VGetDeviceArrayPointer_Hip(N_Vector v)
This function returns pointer to the vector data on the device.

booleantype N_VIsManagedMemory_Hip(N_Vector v)
This function returns a boolean flag indiciating if the vector data array is in managed memory or not.
The NVECTOR_HIP module defines implementations of all standard vector
operations defined in §9.2, §9.2.2,
§9.2.3, and §9.2.4, except for
N_VSetArrayPointer()
.
The names of vector operations are obtained from those in
§9.2, §9.2.2, §9.2.3, and
§9.2.4 by appending the suffix _Hip
(e.g. N_VDestroy_Hip()
). The module NVECTOR_HIP provides the
following additional usercallable routines:

N_Vector N_VNew_Hip(sunindextype length, SUNContext sunctx)
This function creates and allocates memory for a HIP
N_Vector
. The vector data array is allocated on both the host and device.

N_Vector N_VNewManaged_Hip(sunindextype vec_length, SUNContext sunctx)
This function creates and allocates memory for a HIP
N_Vector
. The vector data array is allocated in managed memory.

N_Vector N_VNewWithMemHelp_Hip(sunindextype length, booleantype use_managed_mem, SUNMemoryHelper helper, SUNContext sunctx)
This function creates a new HIP
N_Vector
with a usersupplied SUNMemoryHelper for allocating/freeing memory.

N_Vector N_VNewEmpty_Hip(sunindextype vec_length, SUNContext sunctx)
This function creates a new HIP
N_Vector
where the members of the content structure have not been allocated. This utility function is used by the other constructors to create a new vector.

N_Vector N_VMake_Hip(sunindextype vec_length, realtype *h_vdata, realtype *d_vdata, SUNContext sunctx)
This function creates a HIP
N_Vector
with usersupplied vector data arrays for the host and the device.

N_Vector N_VMakeManaged_Hip(sunindextype vec_length, realtype *vdata, SUNContext sunctx)
This function creates a HIP
N_Vector
with a usersupplied managed memory data array.
The module NVECTOR_HIP also provides the following usercallable routines:

void N_VSetKernelExecPolicy_Hip(N_Vector v, SUNHipExecPolicy *stream_exec_policy, SUNHipExecPolicy *reduce_exec_policy)
This function sets the execution policies which control the kernel parameters utilized when launching the streaming and reduction HIP kernels. By default the vector is setup to use the
SUNHipThreadDirectExecPolicy()
andSUNHipBlockReduceExecPolicy()
. Any custom execution policy for reductions must ensure that the grid dimensions (number of thread blocks) is a multiple of the HIP warp size (32 for NVIDIA GPUs, 64 for AMD GPUs). See §9.16.2 below for more information about theSUNHipExecPolicy
class. ProvidingNULL
for an argument will result in the default policy being restored.The input execution policies are cloned and, as such, may be freed after being attached to the desired vectors. A
NULL
input policy will reset the execution policy to the default setting.Note
Note: All vectors used in a single instance of a SUNDIALS package must use the same execution policy. It is strongly recommended that this function is called immediately after constructing the vector, and any subsequent vector be created by cloning to ensure consistent execution policies across vectors*

realtype *N_VCopyFromDevice_Hip(N_Vector v)
This function copies vector data from the device to the host.

void N_VPrintFile_Hip(N_Vector v, FILE *outfile)
This function prints the content of a HIP vector to
outfile
.
By default all fused and vector array operations are disabled in the NVECTOR_HIP
module. The following additional usercallable routines are provided to
enable or disable fused and vector array operations for a specific vector. To
ensure consistency across vectors it is recommended to first create a vector
with N_VNew_Hip()
, enable/disable the desired operations for that vector
with the functions below, and create any additional vectors from that vector
using N_VClone()
. This guarantees the new vectors will have the same
operations enabled/disabled as cloned vectors inherit the same enable/disable
options as the vector they are cloned from while vectors created with
N_VNew_Hip()
will have the default settings for the NVECTOR_HIP module.

int N_VEnableFusedOps_Hip(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the HIP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_Hip(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the HIP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_Hip(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the HIP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableDotProdMulti_Hip(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the multiple dot products fused operation in the HIP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_Hip(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the HIP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_Hip(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the HIP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_Hip(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the HIP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormVectorArray_Hip(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the WRMS norm operation for vector arrays in the HIP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormMaskVectorArray_Hip(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the masked WRMS norm operation for vector arrays in the HIP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMultiVectorArray_Hip(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector array to multiple vector arrays operation in the HIP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombinationVectorArray_Hip(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination operation for vector arrays in the HIP vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
When there is a need to access components of an
N_Vector_Hip
,v
, it is recommeded to use functionsN_VGetDeviceArrayPointer_Hip()
orN_VGetHostArrayPointer_Hip()
. However, when using managed memory, the functionN_VGetArrayPointer()
may also be used.To maximize efficiency, vector operations in the NVECTOR_HIP implementation that have more than one
N_Vector
argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same internal representations.
9.16.2. The SUNHipExecPolicy
Class
In order to provide maximum flexibility to users, the HIP kernel execution parameters used
by kernels within SUNDIALS are defined by objects of the sundials::hip::ExecPolicy
abstract class type (this class can be accessed in the global namespace as SUNHipExecPolicy
).
Thus, users may provide custom execution policies that fit the needs of their problem. The
SUNHipExecPolicy
class is defined as

typedef sundials::hip::ExecPolicy SUNHipExecPolicy
where the sundials::hip::ExecPolicy
class is defined in the header file
sundials_hip_policies.hpp
, as follows:
class ExecPolicy
{
public:
ExecPolicy(hipStream_t stream = 0) : stream_(stream) { }
virtual size_t gridSize(size_t numWorkUnits = 0, size_t blockDim = 0) const = 0;
virtual size_t blockSize(size_t numWorkUnits = 0, size_t gridDim = 0) const = 0;
virtual const hipStream_t* stream() const { return (&stream_); }
virtual ExecPolicy* clone() const = 0;
ExecPolicy* clone_new_stream(hipStream_t stream) const {
ExecPolicy* ex = clone();
ex>stream_ = stream;
return ex;
}
virtual bool atomic() const { return false; }
virtual ~ExecPolicy() {}
protected:
hipStream_t stream_;
};
To define a custom execution policy, a user simply needs to create a class that inherits from
the abstract class and implements the methods. The SUNDIALS provided
sundials::hip::ThreadDirectExecPolicy
(aka in the global namespace as
SUNHipThreadDirectExecPolicy
) class is a good example of a what a custom execution policy
may look like:
class ThreadDirectExecPolicy : public ExecPolicy
{
public:
ThreadDirectExecPolicy(const size_t blockDim, hipStream_t stream = 0)
: blockDim_(blockDim), ExecPolicy(stream)
{}
ThreadDirectExecPolicy(const ThreadDirectExecPolicy& ex)
: blockDim_(ex.blockDim_), ExecPolicy(ex.stream_)
{}
virtual size_t gridSize(size_t numWorkUnits = 0, size_t /*blockDim*/ = 0) const
{
/* ceil(n/m) = floor((n + m  1) / m) */
return (numWorkUnits + blockSize()  1) / blockSize();
}
virtual size_t blockSize(size_t /*numWorkUnits*/ = 0, size_t /*gridDim*/ = 0) const
{
return blockDim_;
}
virtual ExecPolicy* clone() const
{
return static_cast<ExecPolicy*>(new ThreadDirectExecPolicy(*this));
}
private:
const size_t blockDim_;
};
In total, SUNDIALS provides 4 execution policies:
 SUNHipThreadDirectExecPolicy(const size_t blockDim, const hipStream_t stream = 0)
Maps each HIP thread to a work unit. The number of threads per block (blockDim) can be set to anything. The grid size will be calculated so that there are enough threads for one thread per element. If a HIP stream is provided, it will be used to execute the kernel.
 SUNHipGridStrideExecPolicy(const size_t blockDim, const size_t gridDim, const hipStream_t stream = 0)
Is for kernels that use grid stride loops. The number of threads per block (blockDim) can be set to anything. The number of blocks (gridDim) can be set to anything. If a HIP stream is provided, it will be used to execute the kernel.
 SUNHipBlockReduceExecPolicy(const size_t blockDim, const hipStream_t stream = 0)
Is for kernels performing a reduction across indvidual thread blocks. The number of threads per block (blockDim) can be set to any valid multiple of the HIP warp size. The grid size (gridDim) can be set to any value greater than 0. If it is set to 0, then the grid size will be chosen so that there is enough threads for one thread per work unit. If a HIP stream is provided, it will be used to execute the kernel.
 SUNHipBlockReduceAtomicExecPolicy(const size_t blockDim, const hipStream_t stream = 0)
Is for kernels performing a reduction across indvidual thread blocks using atomic operations. The number of threads per block (blockDim) can be set to any valid multiple of the HIP warp size. The grid size (gridDim) can be set to any value greater than 0. If it is set to 0, then the grid size will be chosen so that there is enough threads for one thread per work unit. If a HIP stream is provided, it will be used to execute the kernel.
For example, a policy that uses 128 threads per block and a user provided stream can be created like so:
hipStream_t stream;
hipStreamCreate(&stream);
SUNHipThreadDirectExecPolicy thread_direct(128, stream);
These default policy objects can be reused for multiple SUNDIALS data structures
(e.g. a SUNMatrix
and an N_Vector
) since they do not hold any
modifiable state information.
9.17. The NVECTOR_SYCL Module
The NVECTOR_SYCL module is an experimental NVECTOR implementation using the SYCL abstraction layer. At present the only supported SYCL compiler is the DPC++ (Intel oneAPI) compiler. This module allows for SUNDIALS vector kernels to run on Intel GPU devices. The module is intended for users who are already familiar with SYCL and GPU programming.
The vector content layout is as follows:
struct _N_VectorContent_Sycl
{
sunindextype length;
booleantype own_helper;
SUNMemory host_data;
SUNMemory device_data;
SUNSyclExecPolicy* stream_exec_policy;
SUNSyclExecPolicy* reduce_exec_policy;
SUNMemoryHelper mem_helper;
sycl::queue* queue;
void* priv; /* 'private' data */
};
typedef struct _N_VectorContent_Sycl *N_VectorContent_Sycl;
The content members are the vector length (size), boolean flags that indicate
if the vector owns the execution policies and memory helper objects (i.e., it is
in charge of freeing the objects), SUNMemory
objects for the vector data on
the host and device, pointers to execution policies that control how streaming
and reduction kernels are launched, a SUNMemoryHelper
for performing memory
operations, the SYCL queue, and a private data structure which holds additional
members that should not be accessed directly.
When instantiated with N_VNew_Sycl()
, the underlying data will be
allocated on both the host and the device. Alternatively, a user can provide
host and device data arrays by using the N_VMake_Sycl()
constructor.
To use managed (shared) memory, the constructors N_VNewManaged_Sycl()
and N_VMakeManaged_Sycl()
are provided. Additionally, a userdefined
SUNMemoryHelper
for allocating/freeing data can be provided with the
constructor N_VNewWithMemHelp_Sycl()
. Details on each of these
constructors are provided below.
The header file to include when using this is nvector_sycl.h
. The installed
module library to link to is libsundials_nvecsycl.lib
. The extension
.lib
is typically .so
for shared libraries .a
for static libraries.
9.17.1. NVECTOR_SYCL functions
The NVECTOR_SYCL module implementations of all vector operations listed in
§9.2, §9.2.2,
§9.2.3, and §9.2.4, except for
N_VDotProdMulti()
, N_VWrmsNormVectorArray()
,
N_VWrmsNormMaskVectorArray()
as support for arrays of reduction
vectors is not yet supported. These functions will be added to the NVECTOR_SYCL
implementation in the future. The names of vector operations are obtained from
those in the aforementioned sections by appending the suffix _Sycl
(e.g.,
N_VDestroy_Sycl
).
Additionally, the NVECTOR_SYCL module provides the following usercallable constructors for creating a new NVECTOR_SYCL:

N_Vector N_VNew_Sycl(sunindextype vec_length, sycl::queue *Q, SUNContext sunctx)
This function creates and allocates memory for an NVECTOR_SYCL. Vector data arrays are allocated on both the host and the device associated with the input queue. All operation are launched in the provided queue.

N_Vector N_VNewManaged_Sycl(sunindextype vec_length, sycl::queue *Q, SUNContext sunctx)
This function creates and allocates memory for a NVECTOR_SYCL. The vector data array is allocated in managed (shared) memory using the input queue. All operation are launched in the provided queue.

N_Vector N_VMake_Sycl(sunindextype length, realtype *h_vdata, realtype *d_vdata, sycl::queue *Q, SUNContext sunctx)
This function creates an NVECTOR_SYCL with usersupplied host and device data arrays. This function does not allocate memory for data itself. All operation are launched in the provided queue.

N_Vector N_VMakeManaged_Sycl(sunindextype length, realtype *vdata, sycl::queue *Q, SUNContext sunctx)
This function creates an NVECTOR_SYCL with a usersupplied managed (shared) data array. This function does not allocate memory for data itself. All operation are launched in the provided queue.

N_Vector N_VNewWithMemHelp_Sycl(sunindextype length, booleantype use_managed_mem, SUNMemoryHelper helper, sycl::queue *Q, SUNContext sunctx)
This function creates an NVECTOR_SYCL with a usersupplied SUNMemoryHelper for allocating/freeing memory. All operation are launched in the provided queue.

N_Vector N_VNewEmpty_Sycl()
This function creates a new
N_Vector
where the members of the content structure have not been allocated. This utility function is used by the other constructors to create a new vector.
The following usercallable functions are provided for accessing the vector data
arrays on the host and device and copying data between the two memory spaces.
Note the generic NVECTOR operations N_VGetArrayPointer()
and
N_VSetArrayPointer()
are mapped to the corresponding HostArray
functions given below. To ensure memory coherency, a user will need to call the
CopyTo
or CopyFrom
functions as necessary to transfer data between the
host and device, unless managed (shared) memory is used.

realtype *N_VGetHostArrayPointer_Sycl(N_Vector v)
This function returns a pointer to the vector host data array.

realtype *N_VGetDeviceArrayPointer_Sycl(N_Vector v)
This function returns a pointer to the vector device data array.

void N_VSetHostArrayPointer_Sycl(realtype *h_vdata, N_Vector v)
This function sets the host array pointer in the vector
v
.

void N_VSetDeviceArrayPointer_Sycl(realtype *d_vdata, N_Vector v)
This function sets the device array pointer in the vector
v
.

void N_VCopyToDevice_Sycl(N_Vector v)
This function copies host vector data to the device.

void N_VCopyFromDevice_Sycl(N_Vector v)
This function copies vector data from the device to the host.

booleantype N_VIsManagedMemory_Sycl(N_Vector v)
This function returns
SUNTRUE
if the vector data is allocated as managed (shared) memory otherwise it returnsSUNFALSE
.
The following usercallable function is provided to set the execution policies for how SYCL kernels are launched on a device.

int N_VSetKernelExecPolicy_Sycl(N_Vector v, SUNSyclExecPolicy *stream_exec_policy, SUNSyclExecPolicy *reduce_exec_policy)
This function sets the execution policies which control the kernel parameters utilized when launching the streaming and reduction kernels. By default the vector is setup to use the
SUNSyclThreadDirectExecPolicy()
andSUNSyclBlockReduceExecPolicy()
. See §9.17.2 below for more information about theSUNSyclExecPolicy
class.The input execution policies are cloned and, as such, may be freed after being attached to the desired vectors. A
NULL
input policy will reset the execution policy to the default setting.Note
All vectors used in a single instance of a SUNDIALS package must use the same execution policy. It is strongly recommended that this function is called immediately after constructing the vector, and any subsequent vector be created by cloning to ensure consistent execution policies across vectors.
The following usercallable functions are provided to print the host vector data
array. Unless managed memory is used, a user may need to call
N_VCopyFromDevice_Sycl()
to ensure consistency between the host and
device array.

void N_VPrint_Sycl(N_Vector v)
This function prints the host data array to
stdout
.

void N_VPrintFile_Sycl(N_Vector v, FILE *outfile)
This function prints the host data array to
outfile
.
By default all fused and vector array operations are disabled in the
NVECTOR_SYCL module. The following additional usercallable routines are
provided to enable or disable fused and vector array operations for a specific
vector. To ensure consistency across vectors it is recommended to first create a
vector with one of the above constructors, enable/disable the desired operations
on that vector with the functions below, and then use this vector in conjunction
with N_VClone()
to create any additional vectors. This guarantees the
new vectors will have the same operations enabled/disabled as cloned vectors
inherit the same enable/disable options as the vector they are cloned from while
vectors created by any of the constructors above will have the default settings
for the NVECTOR_SYCL module.

int N_VEnableFusedOps_Sycl(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the SYCL vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_Sycl(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the SYCL vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_Sycl(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the SYCL vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_Sycl(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the SYCL vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_Sycl(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the SYCL vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_Sycl(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the SYCL vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMultiVectorArray_Sycl(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector array to multiple vector arrays operation in the SYCL vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombinationVectorArray_Sycl(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination operation for vector arrays in the SYCL vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
When there is a need to access components of an NVECTOR_SYCL,
v
, it is recommended to useN_VGetDeviceArrayPointer()
to access the device array orN_VGetArrayPointer()
for the host array. When using managed (shared) memory, either function may be used. To ensure memory coherency, a user may need to call theCopyTo
orCopyFrom
functions as necessary to transfer data between the host and device, unless managed (shared) memory is used.To maximize efficiency, vector operations in the NVECTOR_SYCL implementation that have more than one
N_Vector
argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same internal representations.
9.17.2. The SUNSyclExecPolicy
Class
In order to provide maximum flexibility to users, the SYCL kernel execution
parameters used by kernels within SUNDIALS are defined by objects of the
sundials::sycl::ExecPolicy
abstract class type (this class can be accessed in
the global namespace as SUNSyclExecPolicy
). Thus, users may provide custom
execution policies that fit the needs of their problem. The SUNSyclExecPolicy
class is defined as

typedef sundials::sycl::ExecPolicy SUNSyclExecPolicy
where the sundials::sycl::ExecPolicy
class is defined in the header file
sundials_sycl_policies.hpp
, as follows:
class ExecPolicy
{
public:
virtual size_t gridSize(size_t numWorkUnits = 0, size_t blockDim = 0) const = 0;
virtual size_t blockSize(size_t numWorkUnits = 0, size_t gridDim = 0) const = 0;
virtual ExecPolicy* clone() const = 0;
virtual ~ExecPolicy() {}
};
For consistency the function names and behavior mirror the execution policies
for the CUDA and HIP vectors. In the SYCL case the blockSize
is the local
workgroup range in a onedimensional nd_range
(threads per group). The
gridSize
is the number of local work groups so the global workgroup range
in a onedimensional nd_range
is blockSize * gridSize
(total number of
threads). All vector kernels are written with a manytoone mapping where work
units (vector elements) are mapped in a roundrobin manner across the global
range. As such, the blockSize
and gridSize
can be set to any positive
value.
To define a custom execution policy, a user simply needs to create a class that
inherits from the abstract class and implements the methods. The SUNDIALS
provided sundials::sycl::ThreadDirectExecPolicy
(aka in the global namespace
as SUNSyclThreadDirectExecPolicy
) class is a good example of a what a custom
execution policy may look like:
class ThreadDirectExecPolicy : public ExecPolicy
{
public:
ThreadDirectExecPolicy(const size_t blockDim)
: blockDim_(blockDim)
{}
ThreadDirectExecPolicy(const ThreadDirectExecPolicy& ex)
: blockDim_(ex.blockDim_)
{}
virtual size_t gridSize(size_t numWorkUnits = 0, size_t blockDim = 0) const
{
return (numWorkUnits + blockSize()  1) / blockSize();
}
virtual size_t blockSize(size_t numWorkUnits = 0, size_t gridDim = 0) const
{
return blockDim_;
}
virtual ExecPolicy* clone() const
{
return static_cast<ExecPolicy*>(new ThreadDirectExecPolicy(*this));
}
private:
const size_t blockDim_;
};
SUNDIALS provides the following execution policies:
 SUNSyclThreadDirectExecPolicy(const size_t blockDim)
Is for kernels performing streaming operations and maps each work unit (vector element) to a workitem (thread). Based on the local workgroup range (number of threads per group,
blockSize
) the number of local workgroups (gridSize
) is computed so there are enough workitems in the global workgroup range ( total number of threads,blockSize * gridSize
) for one work unit per workitem (thread).
 SUNSyclGridStrideExecPolicy(const size_t blockDim, const size_t gridDim)
Is for kernels performing streaming operations and maps each work unit (vector element) to a workitem (thread) in a roundrobin manner so the local workgroup range (number of threads per group,
blockSize
) and the number of local workgroups (gridSize
) can be set to any positive value. In this case the global workgroup range (total number of threads,blockSize * gridSize
) may be less than the number of work units (vector elements).
 SUNSyclBlockReduceExecPolicy(const size_t blockDim)
Is for kernels performing a reduction, the local workgroup range (number of threads per group,
blockSize
) and the number of local workgroups (gridSize
) can be set to any positive value or thegridSize
may be set to0
in which case the global range is chosen so that there are enough threads for at most two work units per workitem.
By default the NVECTOR_SYCL module uses the SUNSyclThreadDirectExecPolicy
and SUNSyclBlockReduceExecPolicy
where the default blockDim
is
determined by querying the device for the max_work_group_size
. User may
specify different policies by constructing a new SyclExecPolicy
and
attaching it with N_VSetKernelExecPolicy_Sycl()
. For example, a policy
that uses 128 workitems (threads) per group can be created and attached like
so:
N_Vector v = N_VNew_Sycl(length, SUNContext sunctx);
SUNSyclThreadDirectExecPolicy thread_direct(128);
SUNSyclBlockReduceExecPolicy block_reduce(128);
flag = N_VSetKernelExecPolicy_Sycl(v, &thread_direct, &block_reduce);
These default policy objects can be reused for multiple SUNDIALS data structures
(e.g. a SUNMatrix
and an N_Vector
) since they do not hold any modifiable
state information.
9.18. The NVECTOR_RAJA Module
The NVECTOR_RAJA module is an experimental NVECTOR implementation using the
RAJA hardware abstraction layer. In this
implementation, RAJA allows for SUNDIALS vector kernels to run on AMD, NVIDIA,
or Intel GPU devices. The module is intended for users who are already familiar
with RAJA and GPU programming. Building this vector module requires a C++11
compliant compiler and either the NVIDIA CUDA programming environment, the AMD
ROCm HIP programming environment, or a compiler that supports the SYCL
abstraction layer. When using the AMD ROCm HIP environment, the HIPclang
compiler must be utilized. Users can select which backend to compile with by
setting the SUNDIALS_RAJA_BACKENDS
CMake variable to either CUDA, HIP, or
SYCL. Besides the CUDA, HIP, and SYCL backends, RAJA has other backends such as
serial, OpenMP, and OpenACC. These backends are not used in this SUNDIALS
release.
The vector content layout is as follows:
struct _N_VectorContent_Raja
{
sunindextype length;
booleantype own_data;
realtype* host_data;
realtype* device_data;
void* priv; /* 'private' data */
};
The content members are the vector length (size), a boolean flag that signals if the vector owns the data (i.e., it is in charge of freeing the data), pointers to vector data on the host and the device, and a private data structure which holds the memory management type, which should not be accessed directly.
When instantiated with N_VNew_Raja()
, the underlying data will be allocated
on both the host and the device. Alternatively, a user can provide host
and device data arrays by using the N_VMake_Raja()
constructor. To use
managed memory, the constructors N_VNewManaged_Raja()
and
N_VMakeManaged_Raja()
are provided. Details on each of these constructors
are provided below.
The header file to include when using this is nvector_raja.h
. The installed
module library to link to is libsundials_nveccudaraja.lib
when using the
CUDA backend, libsundials_nvechipraja.lib
when using the HIP backend, and
libsundials_nvecsyclraja.lib
when using the SYCL backend. The extension
.lib
is typically .so
for shared libraries .a
for static libraries.
9.18.1. NVECTOR_RAJA functions
Unlike other native SUNDIALS vector types, the NVECTOR_RAJA module does not provide macros to access its member variables. Instead, user should use the accessor functions:

realtype *N_VGetHostArrayPointer_Raja(N_Vector v)
This function returns pointer to the vector data on the host.

realtype *N_VGetDeviceArrayPointer_Raja(N_Vector v)
This function returns pointer to the vector data on the device.

booleantype N_VIsManagedMemory_Raja(N_Vector v)
This function returns a boolean flag indicating if the vector data is allocated in managed memory or not.
The NVECTOR_RAJA module defines the implementations of all vector
operations listed in §9.2,
§9.2.2, §9.2.3, and
§9.2.4, except for
N_VDotProdMulti()
, N_VWrmsNormVectorArray()
, and
N_VWrmsNormMaskVectorArray()
as support for arrays of reduction
vectors is not yet supported in RAJA. These functions will be added
to the NVECTOR_RAJA implementation in the future. Additionally, the
operations N_VGetArrayPointer()
and N_VSetArrayPointer()
are not implemented by the RAJA vector. As such, this
vector cannot be used with SUNDIALS direct solvers and preconditioners.
The NVECTOR_RAJA module provides separate functions to access data on
the host and on the device. It also provides methods for copying from
the host to the device and vice versa. Usage examples of NVECTOR_RAJA
are provided in some example programs for CVODE [66].
The names of vector operations are obtained from those in
§9.2, §9.2.2,
§9.2.3, and §9.2.4 by
appending the suffix _Raja
(e.g. N_VDestroy_Raja
). The module
NVECTOR_RAJA provides the following additional usercallable routines:

N_Vector N_VNew_Raja(sunindextype vec_length, SUNContext sunctx)
This function creates and allocates memory for a RAJA
N_Vector
. The memory is allocated on both the host and the device. Its only argument is the vector length.

N_Vector N_VNewManaged_Raja(sunindextype vec_length, SUNContext sunctx)
This function creates and allocates memory for a RAJA
N_Vector
. The vector data array is allocated in managed memory.

N_Vector N_VMake_Raja(sunindextype length, realtype *h_data, realtype *v_data, SUNContext sunctx)
This function creates an NVECTOR_RAJA with usersupplied host and device data arrays. This function does not allocate memory for data itself.

N_Vector N_VMakeManaged_Raja(sunindextype length, realtype *vdata, SUNContext sunctx)
This function creates an NVECTOR_RAJA with a usersupplied managed memory data array. This function does not allocate memory for data itself.

N_Vector N_VNewWithMemHelp_Raja(sunindextype length, booleantype use_managed_mem, SUNMemoryHelper helper, SUNContext sunctx)
This function creates an NVECTOR_RAJA with a usersupplied SUNMemoryHelper for allocating/freeing memory.

N_Vector N_VNewEmpty_Raja()
This function creates a new
N_Vector
where the members of the content structure have not been allocated. This utility function is used by the other constructors to create a new vector.

void N_VCopyFromDevice_Raja(N_Vector v)
This function copies vector data from the device to the host.

void N_VPrintFile_Raja(N_Vector v, FILE *outfile)
This function prints the content of a RAJA vector to
outfile
.
By default all fused and vector array operations are disabled in the NVECTOR_RAJA
module. The following additional usercallable routines are provided to
enable or disable fused and vector array operations for a specific vector. To
ensure consistency across vectors it is recommended to first create a vector
with N_VNew_Raja()
, enable/disable the desired operations for that vector
with the functions below, and create any additional vectors from that vector
using N_VClone()
. This guarantees the new vectors will have the same
operations enabled/disabled as cloned vectors inherit the same enable/disable
options as the vector they are cloned from while vectors created with
N_VNew_Raja()
will have the default settings for the NVECTOR_RAJA module.

int N_VEnableFusedOps_Raja(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the RAJA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_Raja(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the RAJA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_Raja(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the RAJA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_Raja(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the RAJA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_Raja(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the RAJA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_Raja(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the RAJA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMultiVectorArray_Raja(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector array to multiple vector arrays operation in the RAJA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombinationVectorArray_Raja(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination operation for vector arrays in the RAJA vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
When there is a need to access components of an NVECTOR_RAJA vector, it is recommended to use functions
N_VGetDeviceArrayPointer_Raja()
orN_VGetHostArrayPointer_Raja()
. However, when using managed memory, the functionN_VGetArrayPointer()
may also be used.To maximize efficiency, vector operations in the NVECTOR_RAJA implementation that have more than one
N_Vector
argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same internal representations.
9.19. The NVECTOR_KOKKOS Module
New in version 6.4.0.
The NVECTOR_KOKKOS N_Vector
implementation provides a vector data
structure using Kokkos [45, 105] to support a
variety of backends including serial, OpenMP, CUDA, HIP, and SYCL. Since Kokkos is
a modern C++ library, the module is also written in modern C++ (it requires
C++14) as a header only library. To utilize this N_Vector
users will need to
include nvector/nvector_kokkos.hpp
. More instructions on building SUNDIALS
with Kokkos enabled are given in §14.1.4.
For instructions on building and using Kokkos, refer to the Kokkos documentation.
9.19.1. Using NVECTOR_KOKKOS
The NVECTOR_KOKKOS module is defined by the Vector
templated
class in the sundials::kokkos
namespace:
template<class ExecutionSpace = Kokkos::DefaultExecutionSpace,
class MemorySpace = typename ExecutionSpace::memory_space>
class Vector : public sundials::impl::BaseNVector,
public sundials::ConvertibleTo<N_Vector>
To use the NVECTOR_KOKKOS module, we construct an instance of the Vector
class e.g.,
// Vector with extent length using the default execution space
sundials::kokkos::Vector<> x{length, sunctx};
// Vector with extent length using the Cuda execution space
sundials::kokkos::Vector<Kokkos::Cuda> x{length, sunctx};
// Vector based on an existing Kokkos::View
Kokkos::View<> view{"a view", length};
sundials::kokkos::Vector<> x{view, sunctx};
// Vector based on an existing Kokkos::View for device and host
Kokkos::View<Kokkos::Cuda> device_view{"a view", length};
Kokkos::View<Kokkos::HostMirror> host_view{Kokkos::create_mirror_view(device_view)};
sundials::kokkos::Vector<> x{device_view, host_view, sunctx};
Instances of the Vector
class are implicitly or explicitly (using the
Convert()
method) convertible to a N_Vector
e.g.,
sundials::kokkos::Vector<> x{length, sunctx};
N_Vector x2 = x; // implicit conversion to N_Vector
N_Vector x3 = x.Convert(); // explicit conversion to N_Vector
No further interaction with a Vector
is required from this point, and
it is possible to use the N_Vector
API to operate on x2
or x3
.
Warning
N_VDestroy()
should never be called on a N_Vector
that was
created via conversion from a sundials::kokkos::Vector
. Doing so may
result in a double free.
The underlying Vector
can be extracted from a N_Vector
using
GetVec()
e.g.,
auto x_vec = GetVec<>(x3);
9.19.2. NVECTOR_KOKKOS API
In this section we list the public API of the sundials::kokkos::Vector
class.

template<class ExecutionSpace = Kokkos::DefaultExecutionSpace, class MemorySpace = class ExecutionSpace::memory_space>
class Vector : public sundials::impl::BaseNVector, public sundials::ConvertibleTo<N_Vector> 
Vector() = default
Default constructor – the vector must be copied or moved to.

Vector(size_type length, SUNContext sunctx)
Constructs a single
Vector
which is based on a 1DKokkos::View
with the ExecutionSpace and MemorySpace provided as template arguments. Parameters
length – length of the vector (i.e., the extent of the View)
sunctx – the SUNDIALS simulation context object (
SUNContext
)

Vector(view_type view, SUNContext sunctx)
Constructs a single
Vector
from an existingKokkos::View
. The View ExecutionSpace and MemorySpace must match the ExecutionSpace and MemorySpace provided as template arguments. Parameters
view – A 1D
Kokkos::View
sunctx – the SUNDIALS simulation context object (
SUNContext
)

Vector(view_type view, host_view_type host_view, SUNContext sunctx)
Constructs a single
Vector
from an existingKokkos::View
for the device and the host. The ExecutionSpace and MemorySpace of the device View must match the ExecutionSpace and MemorySpace provided as template arguments. Parameters
view – A 1D
Kokkos::View
for the devicehost_view – A 1D
Kokkos::View
that is aKokkos::HostMirrror
for the device viewsunctx – the SUNDIALS simulation context object (
SUNContext
)

Vector(const Vector &that_vector)
Copy constructor. This creates a clone of the Vector, i.e., it creates a new Vector with the same properties, such as length, but it does not copy the data.

Vector &operator=(const Vector &rhs)
Copy assignment. This creates a clone of the Vector, i.e., it creates a new Vector with the same properties, such as length, but it does not copy the data.

virtual ~Vector() = default;
Default destructor.

size_type Length()
Get the vector length i.e.,
extent(0)
.

view_type View()
Get the underlying
Kokkos:View
for the device.

host_view_type HostView()
Get the underlying
Kokkos:View
for the host.

Vector() = default

template<class VectorType>
inline VectorType *GetVec(N_Vector v) Get the
Vector
wrapped by a N_Vector.

void CopyToDevice(N_Vector v)
Copy the data from the host view to the device view with
Kokkos::deep_copy
.

void CopyFromDevice(N_Vector v)
Copy the data to the host view from the device view with
Kokkos::deep_copy
.

template<class VectorType>
void CopyToDevice(VectorType &v) Copy the data from the host view to the device view with
Kokkos::deep_copy
.

template<class VectorType>
void CopyFromDevice(VectorType &v) Copy the data to the host view from the device view with
Kokkos::deep_copy
.
9.20. The NVECTOR_OPENMPDEV Module
In situations where a user has access to a device such as a GPU for offloading computation, SUNDIALS provides an NVECTOR implementation using OpenMP device offloading, called NVECTOR_OPENMPDEV.
The NVECTOR_OPENMPDEV implementation defines the content field
of the N_Vector
to be a structure containing the length of the vector, a pointer
to the beginning of a contiguousdata array on the host, a pointer to the beginning of
a contiguous data array on the device, and a boolean flag own_data
which specifies
the ownership of host and device data arrays.
struct _N_VectorContent_OpenMPDEV
{
sunindextype length;
booleantype own_data;
realtype *host_data;
realtype *dev_data;
};
The header file to include when using this module is nvector_openmpdev.h
.
The installed module library to link to is libsundials_nvecopenmpdev.lib
where .lib
is typically .so
for shared libraries and .a
for static libraries.
9.20.1. NVECTOR_OPENMPDEV accessor macros
The following macros are provided to access the content of an NVECTOR_OPENMPDEV vector.

NV_CONTENT_OMPDEV(v)
This macro gives access to the contents of the NVECTOR_OPENMPDEV
N_Vector v
.The assignment
v_cont = NV_CONTENT_S(v)
setsv_cont
to be a pointer to the NVECTOR_OPENMPDEV content structure.Implementation:
#define NV_CONTENT_OMPDEV(v) ( (N_VectorContent_OpenMPDEV)(v>content) )

NV_OWN_DATA_OMPDEV(v)
Access the own_data component of the OpenMPDEV
N_Vector v
.The assignment
v_data = NV_DATA_HOST_OMPDEV(v)
setsv_data
to be a pointer to the first component of the data on the host for theN_Vector v
.Implementation:
#define NV_OWN_DATA_OMPDEV(v) ( NV_CONTENT_OMPDEV(v)>own_data )

NV_DATA_HOST_OMPDEV(v)
The assignment
NV_DATA_HOST_OMPDEV(v) = v_data
sets the host component array ofv
to bev_data
by storing the pointerv_data
.Implementation:
#define NV_DATA_HOST_OMPDEV(v) ( NV_CONTENT_OMPDEV(v)>host_data )

NV_DATA_DEV_OMPDEV(v)
The assignment
v_dev_data = NV_DATA_DEV_OMPDEV(v)
setsv_dev_data
to be a pointer to the first component of the data on the device for theN_Vector v
. The assignmentNV_DATA_DEV_OMPDEV(v) = v_dev_data
sets the device component array ofv
to bev_dev_data
by storing the pointerv_dev_data
.Implementation:
#define NV_DATA_DEV_OMPDEV(v) ( NV_CONTENT_OMPDEV(v)>dev_data )

NV_LENGTH_OMPDEV(V)
Access the length component of the OpenMPDEV
N_Vector v
.The assignment
v_len = NV_LENGTH_OMPDEV(v)
setsv_len
to be the length ofv
. On the other hand, the callNV_LENGTH_OMPDEV(v) = len_v
sets the length ofv
to belen_v
.#define NV_LENGTH_OMPDEV(v) ( NV_CONTENT_OMPDEV(v)>length )
9.20.2. NVECTOR_OPENMPDEV functions
The NVECTOR_OPENMPDEV module defines OpenMP device offloading implementations of all vector
operations listed in §9.2, §9.2.2,
§9.2.3, and §9.2.4, except for
N_VSetArrayPointer()
.
As such, this vector cannot be used with the SUNDIALS direct solvers and preconditioners.
It also provides methods for copying from the host to the device and vice versa.
The names of the vector operations are obtained from those in
§9.2, §9.2.2, §9.2.3, and
§9.2.4 by appending the suffix _OpenMPDEV
(e.g.
N_VDestroy_OpenMPDEV
). The module NVECTOR_OPENMPDEV provides the following additional
usercallable routines:

N_Vector N_VNew_OpenMPDEV(sunindextype vec_length, SUNContext sunctx)
This function creates and allocates memory for an NVECTOR_OPENMPDEV
N_Vector
.

N_Vector N_VNewEmpty_OpenMPDEV(sunindextype vec_length, SUNContext sunctx)
This function creates a new NVECTOR_OPENMPDEV
N_Vector
with an empty (NULL
) data array.

N_Vector N_VMake_OpenMPDEV(sunindextype vec_length, realtype *h_vdata, realtype *d_vdata, SUNContext sunctx)
This function creates an NVECTOR_OPENMPDEV vector with usersupplied vector data arrays
h_vdata
andd_vdata
. This function does not allocate memory for data itself.

realtype *N_VGetHostArrayPointer_OpenMPDEV(N_Vector v)
This function returns a pointer to the host data array.

realtype *N_VGetDeviceArrayPointer_OpenMPDEV(N_Vector v)
This function returns a pointer to the device data array.

void N_VPrint_OpenMPDEV(N_Vector v)
This function prints the content of an NVECTOR_OPENMPDEV vector to
stdout
.

void N_VPrintFile_OpenMPDEV(N_Vector v, FILE *outfile)
This function prints the content of an NVECTOR_OPENMPDEV vector to
outfile
.

void N_VCopyToDevice_OpenMPDEV(N_Vector v)
This function copies the content of an NVECTOR_OPENMPDEV vector’s host data array to the device data array.

void N_VCopyFromDevice_OpenMPDEV(N_Vector v)
This function copies the content of an NVECTOR_OPENMPDEV vector’s device data array to the host data array.
By default all fused and vector array operations are disabled in the NVECTOR_OPENMPDEV
module. The following additional usercallable routines are provided to
enable or disable fused and vector array operations for a specific vector. To
ensure consistency across vectors it is recommended to first create a vector
with N_VNew_OpenMPDEV
, enable/disable the desired operations for that vector
with the functions below, and create any additional vectors from that vector
using N_VClone
. This guarantees the new vectors will have the same
operations enabled/disabled as cloned vectors inherit the same enable/disable
options as the vector they are cloned from while vectors created with
N_VNew_OpenMPDEV
will have the default settings for the NVECTOR_OPENMPDEV module.

int N_VEnableFusedOps_OpenMPDEV(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the NVECTOR_OPENMPDEV vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_OpenMPDEV(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the NVECTOR_OPENMPDEV vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_OpenMPDEV(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the NVECTOR_OPENMPDEV vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableDotProdMulti_OpenMPDEV(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the multiple dot products fused operation in the NVECTOR_OPENMPDEV vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_OpenMPDEV(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_OpenMPDEV(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_OpenMPDEV(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormVectorArray_OpenMPDEV(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the WRMS norm operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormMaskVectorArray_OpenMPDEV(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the masked WRMS norm operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMultiVectorArray_OpenMPDEV(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector array to multiple vector arrays operation in the NVECTOR_OPENMPDEV vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombinationVectorArray_OpenMPDEV(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
When looping over the components of an
N_Vector v
, it is most efficient to first obtain the component array viah_data = N_VGetArrayPointer(v)
for the host array orv_data = N_VGetDeviceArrayPointer(v)
for the device array, or equivalently to use the macrosh_data = NV_DATA_HOST_OMPDEV(v)
for the host array orv_data = NV_DATA_DEV_OMPDEV(v)
for the device array, and then accessh_data[i]
orv_data[i]
within the loop.When accessing individual components of an
N_Vector v
on the host remember to first copy the array back from the device withN_VCopyFromDevice_OpenMPDEV(v)
to ensure the array is up to date.N_VNewEmpty_OpenMPDEV()
,N_VMake_OpenMPDEV()
, andN_VCloneVectorArrayEmpty_OpenMPDEV()
set the field own_data toSUNFALSE
. The functionsN_VDestroy_OpenMPDEV()
andN_VDestroyVectorArray_OpenMPDEV()
will not attempt to free the pointer data for anyN_Vector
with own_data set toSUNFALSE
. In such a case, it is the user’s responsibility to deallocate the data pointers.To maximize efficiency, vector operations in the NVECTOR_OPENMPDEV implementation that have more than one
N_Vector
argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same length.
9.21. The NVECTOR_TRILINOS Module
The NVECTOR_TRILINOS module is an NVECTOR wrapper around the
Trilinos Tpetra vector.
The interface to Tpetra is implemented in the
sundials::trilinos::nvector_tpetra::TpetraVectorInterface
class. This class simply stores
a reference counting pointer to a Tpetra vector and inherits from
an empty structure
struct _N_VectorContent_Trilinos {};
to interface the C++ class with the NVECTOR C code.
A pointer to an instance of this class is kept in the content field
of the N_Vector
object, to ensure that the Tpetra vector
is not deleted for as long as the N_Vector
object exists.
The Tpetra vector type in the sundials::trilinos::nvector_tpetra::TpetraVectorInterface
class is defined as:

typedef Tpetra::Vector<realtype, int, sunindextype> vector_type;
The Tpetra vector will use the SUNDIALSspecified realtype
as its scalar
type, int
as the local ordinal type, and sunindextype
as the global ordinal type.
This type definition will use Tpetra’s default node type. Available Kokkos node
types as of the Trilinos 12.14 release are serial (single thread), OpenMP, Pthread,
and CUDA. The default node type is selected when building the Kokkos package.
For example, the Tpetra vector will use a CUDA node if Tpetra was built with
CUDA support and the CUDA node was selected as the default when Tpetra was
built.
The header file to include when using this module is nvector_trilinos.h
.
The installed module library to link to is libsundials_nvectrilinos.lib
where .lib
is typically .so
for shared libraries and .a
for static libraries.
9.21.1. NVECTOR_TRILINOS functions
The NVECTOR_TRILINOS module defines implementations of all vector
operations listed in §9.2,
§9.2.2, §9.2.3, and
§9.2.4, except for
N_VGetArrayPointer()
and N_VSetArrayPointer()
. As
such, this vector cannot be used with the SUNDIALS direct solvers
and preconditioners. When access to raw
vector data is needed, it is recommended to extract the Trilinos
Tpetra vector first, and then use Tpetra vector methods to access the
data. Usage examples of NVECTOR_TRILINOS are provided in example
programs for IDA.
The names of vector operations are obtained from those in
§9.2 by appending the suffice _Trilinos
(e.g. N_VDestroy_Trilinos
). Vector operations call existing
Tpetra::Vector
methods when available. Vector operations specific
to SUNDIALS are implemented as standalone functions in the namespace
sundials::trilinos::nvector_tpetra::TpetraVector
, located in the file SundialsTpetraVectorKernels.hpp
.
The module NVECTOR_TRILINOS provides the following additional usercallable routines:

Teuchos::RCP<vector_type> N_VGetVector_Trilinos(N_Vector v)
This C++ function takes an
N_Vector
as the argument and returns a reference counting pointer to the underlying Tpetra vector. This is a standalone function defined in the global namespace.

N_Vector N_VMake_Trilinos(Teuchos::RCP<vector_type> v)
This C++ function creates and allocates memory for an NVECTOR_TRILINOS wrapper around a userprovided Tpetra vector. This is a standalone function defined in the global namespace.
Notes
The template parameter
vector_type
should be set as:typedef sundials::trilinos::nvector_tpetra::TpetraVectorInterface::vector_type vector_type
This will ensure that data types used in Tpetra vector match those in SUNDIALS.
When there is a need to access components of an
N_Vector_Trilinos v
, it is recommeded to extract the Trilinos vector object viax_vec = N_VGetVector_Trilinos(v)
and then access components using the appropriate Trilinos functions.The functions
N_VDestroy_Trilinos
andN_VDestroyVectorArray_Trilinos
only delete theN_Vector
wrapper. The underlying Tpetra vector object will exist for as long as there is at least one reference to it.
9.22. The NVECTOR_MANYVECTOR Module
The NVECTOR_MANYVECTOR module is designed to facilitate problems with an inherent data partitioning within a computational node for the solution vector. These data partitions are entirely userdefined, through construction of distinct NVECTOR modules for each component, that are then combined together to form the NVECTOR_MANYVECTOR. Two potential use cases for this flexibility include:
Heterogenous computational architectures: for data partitioning between different computing resources on a node, architecturespecific subvectors may be created for each partition. For example, a user could create one GPUaccelerated component based on NVECTOR_CUDA, and another CPU threaded component based on NVECTOR_OPENMP.
Structure of arrays (SOA) data layouts: for problems that require separate subvectors for each solution component. For example, in an incompressible NavierStokes simulation, separate subvectors may be used for velocities and pressure, which are combined together into a single NVECTOR_MANYVECTOR for the overall “solution”.
The above use cases are neither exhaustive nor mutually exclusive, and the NVECTOR_MANYVECTOR implementation should support arbitrary combinations of these cases.
The NVECTOR_MANYVECTOR implementation is designed to work with any
NVECTOR subvectors that implement the minimum “standard” set
of operations in §9.2.1. Additionally,
NVECTOR_MANYVECTOR sets no limit on the
number of subvectors that may be attached (aside from the limitations
of using sunindextype
for indexing, and standard pernode memory
limitations). However, while this ostensibly supports subvectors
with one entry each (i.e., one subvector for each solution entry), we
anticipate that this extreme situation will hinder performance due to
nonstrideone memory accesses and increased function call overhead.
We therefore recommend a relatively coarse partitioning of the
problem, although actual performance will likely be
problemdependent.
As a final note, in the coming years we plan to introduce additional
algebraic solvers and time integration modules that will leverage the
problem partitioning enabled by NVECTOR_MANYVECTOR. However, even at
present we anticipate that users will be able to leverage such data
partitioning in their problemdefining ODE righthand side function, DAE
or nonlinear solver residual function, preconditioners, or custom
SUNLinearSolver
or SUNNonlinearSolver
modules.
9.22.1. NVECTOR_MANYVECTOR structure
The NVECTOR_MANYVECTOR implementation defines the content field
of N_Vector
to be a structure containing the number of
subvectors comprising the ManyVector, the global length of the
ManyVector (including all subvectors), a pointer to
the beginning of the array of subvectors, and a boolean flag
own_data
indicating ownership of the subvectors that populate
subvec_array
.
struct _N_VectorContent_ManyVector {
sunindextype num_subvectors; /* number of vectors attached */
sunindextype global_length; /* overall manyvector length */
N_Vector* subvec_array; /* pointer to N_Vector array */
booleantype own_data; /* flag indicating data ownership */
};
The header file to include when using this module is
nvector_manyvector.h
. The installed module library to link against is
libsundials_nvecmanyvector.lib
where .lib
is typically .so
for
shared libraries and .a
for static libraries.
9.22.2. NVECTOR_MANYVECTOR functions
The NVECTOR_MANYVECTOR module implements all vector operations listed
in §9.2 except for
N_VGetArrayPointer()
, N_VSetArrayPointer()
,
N_VScaleAddMultiVectorArray()
, and
N_VLinearCombinationVectorArray()
. As such, this vector
cannot be used with the SUNDIALS direct solvers and preconditioners.
Instead, the NVECTOR_MANYVECTOR module provides functions to access
subvectors, whose data may in turn be accessed according to their
NVECTOR implementations.
The names of vector operations are obtained from those in
§9.2 by appending the suffix _ManyVector
(e.g.
N_VDestroy_ManyVector
). The module NVECTOR_MANYVECTOR provides
the following additional usercallable routines:

N_Vector N_VNew_ManyVector(sunindextype num_subvectors, N_Vector *vec_array, SUNContext sunctx)
This function creates a ManyVector from a set of existing NVECTOR objects.
This routine will copy all
N_Vector
pointers from the inputvec_array
, so the user may modify/free that pointer array after calling this function. However, this routine does not allocate any new subvectors, so the underlying NVECTOR objects themselves should not be destroyed before the ManyVector that contains them.Upon successful completion, the new ManyVector is returned; otherwise this routine returns
NULL
(e.g., a memory allocation failure occurred).Users of the Fortran 2003 interface to this function will first need to use the generic
N_Vector
utility functionsN_VNewVectorArray()
, andN_VSetVecAtIndexVectorArray()
to create theN_Vector*
argument. This is further explained in §2.5.2.5, and the functions are documented in §9.1.1.

N_Vector N_VGetSubvector_ManyVector(N_Vector v, sunindextype vec_num)
This function returns the vec_num subvector from the NVECTOR array.

sunindextype N_VGetSubvectorLocalLength_ManyVector(N_Vector v, sunindextype vec_num)
This function returns the local length of the vec_num subvector from the NVECTOR array.
Usage:
local_length = N_VGetSubvectorLocalLength_ManyVector(v, 0);

realtype *N_VGetSubvectorArrayPointer_ManyVector(N_Vector v, sunindextype vec_num)
This function returns the data array pointer for the vec_num subvector from the NVECTOR array.
If the input vec_num is invalid, or if the subvector does not support the
N_VGetArrayPointer
operation, thenNULL
is returned.

int N_VSetSubvectorArrayPointer_ManyVector(realtype *v_data, N_Vector v, sunindextype vec_num)
This function sets the data array pointer for the vec_num subvector from the NVECTOR array.
If the input vec_num is invalid, or if the subvector does not support the
N_VSetArrayPointer
operation, then1
is returned; otherwise it returns0
.

sunindextype N_VGetNumSubvectors_ManyVector(N_Vector v)
This function returns the overall number of subvectors in the ManyVector object.
By default all fused and vector array operations are disabled in the
NVECTOR_MANYVECTOR module, except for N_VWrmsNormVectorArray()
and N_VWrmsNormMaskVectorArray()
, that are enabled by
default. The following additional usercallable routines are provided
to enable or disable fused and vector array operations for a specific
vector. To ensure consistency across vectors it is recommended to
first create a vector with N_VNew_ManyVector()
,
enable/disable the desired operations
for that vector with the functions below, and create any additional
vectors from that vector using N_VClone()
. This guarantees
that the new vectors will have the same operations enabled/disabled,
since cloned vectors inherit those configuration options from the
vector they are cloned from, while vectors created with
N_VNew_ManyVector()
will
have the default settings for the NVECTOR_MANYVECTOR module. We note
that these routines do not call the corresponding routines on
subvectors, so those should be set up as desired before attaching
them to the ManyVector in N_VNew_ManyVector()
.

int N_VEnableFusedOps_ManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the manyvector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_ManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the manyvector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_ManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the manyvector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableDotProdMulti_ManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the multiple dot products fused operation in the manyvector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_ManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the manyvector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_ManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the manyvector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_ManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the manyvector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormVectorArray_ManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the WRMS norm operation for vector arrays in the manyvector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormMaskVectorArray_ManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the masked WRMS norm operation for vector arrays in the manyvector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
N_VNew_ManyVector()
sets the fieldown_data = SUNFALSE
.N_VDestroy_ManyVector()
will not attempt to callN_VDestroy()
on any subvectors contained in the subvector array for anyN_Vector
withown_data
set toSUNFALSE
. In such a case, it is the user’s responsibility to deallocate the subvectors.To maximize efficiency, arithmetic vector operations in the NVECTOR_MANYVECTOR implementation that have more than one
N_Vector
argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same subvector representations.
9.23. The NVECTOR_MPIMANYVECTOR Module
The NVECTOR_MPIMANYVECTOR module is designed to facilitate problems with an inherent data partitioning for the solution vector, and when using distributedmemory parallel architectures. As such, this implementation supports all use cases allowed by the MPIunaware NVECTOR_MANYVECTOR implementation, as well as partitioning data between nodes in a parallel environment. These data partitions are entirely userdefined, through construction of distinct NVECTOR modules for each component, that are then combined together to form the NVECTOR_MPIMANYVECTOR. Three potential use cases for this module include:
Heterogenous computational architectures (singlenode or multinode): for data partitioning between different computing resources on a node, architecturespecific subvectors may be created for each partition. For example, a user could create one MPIparallel component based on NVECTOR_PARALLEL, another GPUaccelerated component based on NVECTOR_CUDA.
Processbased multiphysics decompositions (multinode): for computations that combine separate MPIbased simulations together, each subvector may reside on a different MPI communicator, and the MPIManyVector combines these via an MPI intercommunicator that connects these distinct simulations together.
Structure of arrays (SOA) data layouts (singlenode or multinode): for problems that require separate subvectors for each solution component. For example, in an incompressible NavierStokes simulation, separate subvectors may be used for velocities and pressure, which are combined together into a single MPIManyVector for the overall “solution”.
The above use cases are neither exhaustive nor mutually exclusive, and the NVECTOR_MANYVECTOR implementation should support arbitrary combinations of these cases.
The NVECTOR_MPIMANYVECTOR implementation is designed to work with any NVECTOR subvectors that implement the minimum “standard” set of operations in §9.2.1, however significant performance benefits may be obtained when subvectors additionally implement the optional local reduction operations listed in §9.2.4.
Additionally, NVECTOR_MPIMANYVECTOR sets no limit on the number of
subvectors that may be attached (aside from the limitations of using
sunindextype
for indexing, and standard pernode memory
limitations). However, while this ostensibly supports subvectors
with one entry each (i.e., one subvector for each solution entry), we
anticipate that this extreme situation will hinder performance due to
nonstrideone memory accesses and increased function call overhead.
We therefore recommend a relatively coarse partitioning of the
problem, although actual performance will likely be
problemdependent.
As a final note, in the coming years we plan to introduce additional
algebraic solvers and time integration modules that will leverage the
problem partitioning enabled by NVECTOR_MPIMANYVECTOR. However, even at
present we anticipate that users will be able to leverage such data
partitioning in their problemdefining ODE righthand side function, DAE
or nonlinear solver residual function, preconditioners, or custom
SUNLinearSolver
or SUNNonlinearSolver
modules.
9.23.1. NVECTOR_MPIMANYVECTOR structure
The NVECTOR_MPIMANYVECTOR implementation defines the content field
of N_Vector
to be a structure containing the MPI communicator
(or MPI_COMM_NULL
if running on a singlenode), the number of
subvectors comprising the MPIManyVector, the global length of the
MPIManyVector (including all subvectors on all MPI ranks), a pointer to
the beginning of the array of subvectors, and a boolean flag
own_data
indicating ownership of the subvectors that populate
subvec_array
.
struct _N_VectorContent_MPIManyVector {
MPI_Comm comm; /* overall MPI communicator */
sunindextype num_subvectors; /* number of vectors attached */
sunindextype global_length; /* overall mpimanyvector length */
N_Vector* subvec_array; /* pointer to N_Vector array */
booleantype own_data; /* flag indicating data ownership */
};
The header file to include when using this module is
nvector_mpimanyvector.h
. The installed module library to link against is
libsundials_nvecmpimanyvector.lib
where .lib
is typically .so
for
shared libraries and .a
for static libraries.
Note
If SUNDIALS is configured with MPI disabled, then the MPIManyVector
library will not be built. Furthermore, any user codes that include
nvector_mpimanyvector.h
must be compiled using an MPIaware
compiler (whether the specific user code utilizes MPI or not). We
note that the NVECTOR_MANYVECTOR implementation is designed for
ManyVector use cases in an MPIunaware environment.
9.23.2. NVECTOR_MPIMANYVECTOR functions
The NVECTOR_MPIMANYVECTOR module implements all vector operations listed
in §9.2, except for N_VGetArrayPointer()
,
N_VSetArrayPointer()
, N_VScaleAddMultiVectorArray()
,
and N_VLinearCombinationVectorArray()
. As such, this vector
cannot be used with the SUNDIALS direct solvers and preconditioners.
Instead, the NVECTOR_MPIMANYVECTOR module provides functions to access
subvectors, whose data may in turn be accessed according to their
NVECTOR implementations.
The names of vector operations are obtained from those in
§9.2 by appending the suffix _MPIManyVector
(e.g.
N_VDestroy_MPIManyVector
). The module NVECTOR_MPIMANYVECTOR provides
the following additional usercallable routines:

N_Vector N_VNew_MPIManyVector(sunindextype num_subvectors, N_Vector *vec_array, SUNContext sunctx)
This function creates a MPIManyVector from a set of existing NVECTOR objects, under the requirement that all MPIaware subvectors use the same MPI communicator (this is checked internally). If none of the subvectors are MPIaware, then this may equivalently be used to describe data partitioning within a single node. We note that this routine is designed to support use cases A and C above.
This routine will copy all
N_Vector
pointers from the inputvec_array
, so the user may modify/free that pointer array after calling this function. However, this routine does not allocate any new subvectors, so the underlying NVECTOR objects themselves should not be destroyed before the MPIManyVector that contains them.Upon successful completion, the new MPIManyVector is returned; otherwise this routine returns
NULL
(e.g., if two MPIaware subvectors use different MPI communicators).Users of the Fortran 2003 interface to this function will first need to use the generic
N_Vector
utility functionsN_VNewVectorArray()
, andN_VSetVecAtIndexVectorArray()
to create theN_Vector*
argument. This is further explained in §2.5.2.5, and the functions are documented in §9.1.1.

N_Vector N_VMake_MPIManyVector(MPI_Comm comm, sunindextype num_subvectors, N_Vector *vec_array, SUNContext sunctx)
This function creates a MPIManyVector from a set of existing NVECTOR objects, and a usercreated MPI communicator that “connects” these subvectors. Any MPIaware subvectors may use different MPI communicators than the input comm. We note that this routine is designed to support any combination of the use cases above.
The input comm should be this usercreated MPI communicator. This routine will internally call
MPI_Comm_dup
to create a copy of the inputcomm
, so the usersuppliedcomm
argument need not be retained after the call toN_VMake_MPIManyVector()
.If all subvectors are MPIunaware, then the input comm argument should be
MPI_COMM_NULL
, although in this case, it would be simpler to callN_VNew_MPIManyVector()
instead, or to just use the NVECTOR_MANYVECTOR module.This routine will copy all
N_Vector
pointers from the input vec_array, so the user may modify/free that pointer array after calling this function. However, this routine does not allocate any new subvectors, so the underlying NVECTOR objects themselves should not be destroyed before the MPIManyVector that contains them.Upon successful completion, the new MPIManyVector is returned; otherwise this routine returns
NULL
(e.g., if the input vec_array isNULL
).

N_Vector N_VGetSubvector_MPIManyVector(N_Vector v, sunindextype vec_num)
This function returns the vec_num subvector from the NVECTOR array.

sunindextype N_VGetSubvectorLocalLength_MPIManyVector(N_Vector v, sunindextype vec_num)
This function returns the local length of the vec_num subvector from the NVECTOR array.
Usage:
local_length = N_VGetSubvectorLocalLength_MPIManyVector(v, 0);

realtype *N_VGetSubvectorArrayPointer_MPIManyVector(N_Vector v, sunindextype vec_num)
This function returns the data array pointer for the vec_num subvector from the NVECTOR array.
If the input vec_num is invalid, or if the subvector does not support the
N_VGetArrayPointer
operation, thenNULL
is returned.

int N_VSetSubvectorArrayPointer_MPIManyVector(realtype *v_data, N_Vector v, sunindextype vec_num)
This function sets the data array pointer for the vec_num subvector from the NVECTOR array.
If the input vec_num is invalid, or if the subvector does not support the
N_VSetArrayPointer
operation, then1
is returned; otherwise it returns0
.

sunindextype N_VGetNumSubvectors_MPIManyVector(N_Vector v)
This function returns the overall number of subvectors in the MPIManyVector object.
By default all fused and vector array operations are disabled in the
NVECTOR_MPIMANYVECTOR module, except for N_VWrmsNormVectorArray()
and N_VWrmsNormMaskVectorArray()
, that are enabled by default.
The following additional usercallable routines are provided to enable or
disable fused and vector array operations for a specific vector. To
ensure consistency across vectors it is recommended to first create a
vector with N_VNew_MPIManyVector()
or
N_VMake_MPIManyVector()
, enable/disable the desired operations
for that vector with the functions below, and create any additional
vectors from that vector using N_VClone()
. This guarantees
that the new vectors will have the same operations enabled/disabled,
since cloned vectors inherit those configuration options from the
vector they are cloned from, while vectors created with
N_VNew_MPIManyVector()
and N_VMake_MPIManyVector()
will
have the default settings for the NVECTOR_MPIMANYVECTOR module. We note
that these routines do not call the corresponding routines on
subvectors, so those should be set up as desired before attaching
them to the MPIManyVector in N_VNew_MPIManyVector()
or
N_VMake_MPIManyVector()
.

int N_VEnableFusedOps_MPIManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) all fused and vector array operations in the MPIManyVector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearCombination_MPIManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear combination fused operation in the MPIManyVector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleAddMulti_MPIManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale and add a vector to multiple vectors fused operation in the MPIManyVector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableDotProdMulti_MPIManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the multiple dot products fused operation in the MPIManyVector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableLinearSumVectorArray_MPIManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the linear sum operation for vector arrays in the MPIManyVector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableScaleVectorArray_MPIManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the scale operation for vector arrays in the MPIManyVector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableConstVectorArray_MPIManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the const operation for vector arrays in the MPIManyVector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormVectorArray_MPIManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the WRMS norm operation for vector arrays in the MPIManyVector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.

int N_VEnableWrmsNormMaskVectorArray_MPIManyVector(N_Vector v, booleantype tf)
This function enables (
SUNTRUE
) or disables (SUNFALSE
) the masked WRMS norm operation for vector arrays in the MPIManyVector vector. The return value is0
for success and1
if the input vector or itsops
structure areNULL
.
Notes
N_VNew_MPIManyVector()
andN_VMake_MPIManyVector()
set the fieldown_data = SUNFALSE
.N_VDestroy_MPIManyVector()
will not attempt to callN_VDestroy()
on any subvectors contained in the subvector array for anyN_Vector
withown_data
set toSUNFALSE
. In such a case, it is the user’s responsibility to deallocate the subvectors.To maximize efficiency, arithmetic vector operations in the NVECTOR_MPIMANYVECTOR implementation that have more than one
N_Vector
argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same subvector representations.
9.24. The NVECTOR_MPIPLUSX Module
The NVECTOR_MPIPLUSX module is designed to facilitate the MPI+X paradigm, where X is some form of onnode (local) parallelism (e.g. OpenMP, CUDA). This paradigm is becoming increasingly popular with the rise of heterogeneous computing architectures.
The NVECTOR_MPIPLUSX implementation is designed to work with any NVECTOR that implements the minimum “standard” set of operations in §9.2.1. However, it is not recommended to use the NVECTOR_PARALLEL, NVECTOR_PARHYP, NVECTOR_PETSC, or NVECTOR_TRILINOS implementations underneath the NVECTOR_MPIPLUSX module since they already provide MPI capabilities.
9.24.1. NVECTOR_MPIPLUSX structure
The NVECTOR_MPIPLUSX implementation is a thin wrapper around the NVECTOR_MPIMANYVECTOR. Accordingly, it adopts the same content structure as defined in §9.23.1.
The header file to include when using this module is
nvector_mpiplusx.h
. The installed module library to link against is
libsundials_nvecmpiplusx.lib
where .lib
is typically .so
for
shared libraries and .a
for static libraries.
Note
If SUNDIALS is configured with MPI disabled, then the
mpiplusx library will not be built. Furthermore, any user codes
that include nvector_mpiplusx.h
must be compiled using an
MPIaware compiler.
9.24.2. NVECTOR_MPIPLUSX functions
The NVECTOR_MPIPLUSX module adopts all vector operations listed
in §9.2, from the NVECTOR_MPIMANYVECTOR (see
§9.23) except for
N_VGetArrayPointer()
, and N_VSetArrayPointer()
;
the module provides its own implementation of these functions that
call the local vector implementations. Therefore, the NVECTOR_MPIPLUSX
module implements all of the operations listed in the referenced
sections except for N_VScaleAddMultiVectorArray()
,
and N_VLinearCombinationVectorArray()
. Accordingly, it’s
compatibility with the SUNDIALS direct solvers and preconditioners
depends on the local vector implementation.
The module NVECTOR_MPIPLUSX provides the following additional usercallable routines:

N_Vector N_VMake_MPIPlusX(MPI_Comm comm, N_Vector *local_vector, SUNContext sunctx)
This function creates a MPIPlusX vector from an exisiting local (i.e. on node) NVECTOR object, and a usercreated MPI communicator.
The input comm should be this usercreated MPI communicator. This routine will internally call
MPI_Comm_dup
to create a copy of the inputcomm
, so the usersuppliedcomm
argument need not be retained after the call toN_VMake_MPIPlusX()
.This routine will copy the NVECTOR pointer to the input
local_vector
, so the underlying local NVECTOR object should not be destroyed before the mpiplusx that contains it.Upon successful completion, the new MPIPlusX is returned; otherwise this routine returns
NULL
(e.g., if the input local_vector isNULL
).

N_Vector N_VGetLocalVector_MPIPlusX(N_Vector v)
This function returns the local vector underneath the MPIPlusX NVECTOR.

sunindextype N_VGetLocalLength_MPIPlusX(N_Vector v)
This function returns the local length of the vector underneath the MPIPlusX NVECTOR.
Usage:
local_length = N_VGetLocalLength_MPIPlusX(v);

realtype *N_VGetArrayPointer_MPIPlusX(N_Vector v)
This function returns the data array pointer for the local vector.
If the local vector does not support the
N_VGetArrayPointer()
operation, thenNULL
is returned.

void N_VSetArrayPointer_MPIPlusX(realtype *v_data, N_Vector v)
This function sets the data array pointer for the local vector if the local vector implements the
N_VSetArrayPointer()
operation.
The NVECTOR_MPIPLUSX module does not implement any fused or vector array operations. Instead users should enable/disable fused operations on the local vector.
Notes
N_VMake_MPIPlusX()
sets the fieldown_data = SUNFALSE
andN_VDestroy_MPIPlusX()
will not callN_VDestroy()
on the local vector. In this a case, it is the user’s responsibility to deallocate the local vector.To maximize efficiency, arithmetic vector operations in the NVECTOR_MPIPLUSX implementation that have more than one
N_Vector
argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called withN_Vector
arguments that were all created with the same subvector representations.
9.25. NVECTOR Examples
There are NVECTOR examples that may be installed for eac himplementation.
Each implementation makes use of the functions in test_nvector.c
.
These example functions show simple usage of the NVECTOR family
of functions. The input to the examples are the vector length, number
of threads (if threaded implementation), and a print timing flag.
The following is a list of the example functions in test_nvector.c
:
Test_N_VClone
: Creates clone of vector and checks validity of clone.Test_N_VCloneEmpty
: Creates clone of empty vector and checks validity of clone.Test_N_VCloneVectorArray
: Creates clone of vector array and checks validity of cloned array.Test_N_VCloneVectorArray
: Creates clone of empty vector array and checks validity of cloned array.Test_N_VGetArrayPointer
: Get array pointer.Test_N_VSetArrayPointer
: Allocate new vector, set pointer to new vector array, and check values.Test_N_VGetLength
: Compares selfreported length to calculated length.Test_N_VGetCommunicator
: Compares selfreported communicator to the one used in constructor; or for MPIunaware vectors it ensures that NULL is reported.Test_N_VLinearSum
Case 1a: Test y = x + yTest_N_VLinearSum
Case 1b: Test y = x + yTest_N_VLinearSum
Case 1c: Test y = ax + yTest_N_VLinearSum
Case 2a: Test x = x + yTest_N_VLinearSum
Case 2b: Test x = x  yTest_N_VLinearSum
Case 2c: Test x = x + byTest_N_VLinearSum
Case 3: Test z = x + yTest_N_VLinearSum
Case 4a: Test z = x  yTest_N_VLinearSum
Case 4b: Test z = x + yTest_N_VLinearSum
Case 5a: Test z = x + byTest_N_VLinearSum
Case 5b: Test z = ax + yTest_N_VLinearSum
Case 6a: Test z = x + byTest_N_VLinearSum
Case 6b: Test z = ax  yTest_N_VLinearSum
Case 7: Test z = a(x + y)Test_N_VLinearSum
Case 8: Test z = a(x  y)Test_N_VLinearSum
Case 9: Test z = ax + byTest_N_VConst
: Fill vector with constant and check result.Test_N_VProd
: Test vector multiply: z = x * yTest_N_VDiv
: Test vector division: z = x / yTest_N_VScale
: Case 1: scale: x = cxTest_N_VScale
: Case 2: copy: z = xTest_N_VScale
: Case 3: negate: z = xTest_N_VScale
: Case 4: combination: z = cxTest_N_VAbs
: Create absolute value of vector.Test_N_VInv
: Compute z[i] = 1 / x[i]
** Test_N_VAddConst
: add constant vector: z = c + x
Test_N_VDotProd
: Calculate dot product of two vectors.Test_N_VMaxNorm
: Create vector with known values, find and validate the max norm.Test_N_VWrmsNorm
: Create vector of known values, find and validate the weighted root mean square.Test_N_VWrmsNormMask
: Create vector of known values, find and validate the weighted root mean square using all elements except one.Test_N_VMin
: Create vector, find and validate the min.Test_N_VWL2Norm
: Create vector, find and validate the weighted Euclidean L2 norm.Test_N_VL1Norm
: Create vector, find and validate the L1 norm.Test_N_VCompare
: Compare vector with constant returning and validating comparison vector.Test_N_VInvTest
: Test z[i] = 1 / x[i]Test_N_VConstrMask
: Test mask of vector x with vector c.Test_N_VMinQuotient
: Fill two vectors with known values. Calculate and validate minimum quotient.Test_N_VLinearCombination
: Case 1a: Test x = a xTest_N_VLinearCombination
: Case 1b: Test z = a xTest_N_VLinearCombination
: Case 2a: Test x = a x + b yTest_N_VLinearCombination
: Case 2b: Test z = a x + b yTest_N_VLinearCombination
: Case 3a: Test x = x + a y + b zTest_N_VLinearCombination
: Case 3b: Test x = a x + b y + c zTest_N_VLinearCombination
: Case 3c: Test w = a x + b y + c zTest_N_VScaleAddMulti
: Case 1a: y = a x + yTest_N_VScaleAddMulti
: Case 1b: z = a x + yTest_N_VScaleAddMulti
: Case 2a: Y[i] = c[i] x + Y[i], i = 1,2,3Test_N_VScaleAddMulti
: Case 2b: Z[i] = c[i] x + Y[i], i = 1,2,3Test_N_VDotProdMulti
: Case 1: Calculate the dot product of two vectorsTest_N_VDotProdMulti
: Case 2: Calculate the dot product of one vector with three other vectors in a vector array.Test_N_VLinearSumVectorArray
: Case 1: z = a x + b yTest_N_VLinearSumVectorArray
: Case 2a: Z[i] = a X[i] + b Y[i]Test_N_VLinearSumVectorArray
: Case 2b: X[i] = a X[i] + b Y[i]Test_N_VLinearSumVectorArray
: Case 2c: Y[i] = a X[i] + b Y[i]Test_N_VScaleVectorArray
: Case 1a: y = c yTest_N_VScaleVectorArray
: Case 1b: z = c yTest_N_VScaleVectorArray
: Case 2a: Y[i] = c[i] Y[i]Test_N_VScaleVectorArray
: Case 2b: Z[i] = c[i] Y[i]Test_N_VConstVectorArray
: Case 1a: z = cTest_N_VConstVectorArray
: Case 1b: Z[i] = cTest_N_VWrmsNormVectorArray
: Case 1a: Create a vector of know values, find and validate the weighted root mean square norm.Test_N_VWrmsNormVectorArray
: Case 1b: Create a vector array of three vectors of know values, find and validate the weighted root mean square norm of each.Test_N_VWrmsNormMaskVectorArray
: Case 1a: Create a vector of know values, find and validate the weighted root mean square norm using all elements except one.Test_N_VWrmsNormMaskVectorArray
: Case 1b: Create a vector array of three vectors of know values, find and validate the weighted root mean square norm of each using all elements except one.Test_N_VScaleAddMultiVectorArray
: Case 1a: y = a x + yTest_N_VScaleAddMultiVectorArray
: Case 1b: z = a x + yTest_N_VScaleAddMultiVectorArray
: Case 2a: Y[j][0] = a[j] X[0] + Y[j][0]Test_N_VScaleAddMultiVectorArray
: Case 2b: Z[j][0] = a[j] X[0] + Y[j][0]Test_N_VScaleAddMultiVectorArray
: Case 3a: Y[0][i] = a[0] X[i] + Y[0][i]Test_N_VScaleAddMultiVectorArray
: Case 3b: Z[0][i] = a[0] X[i] + Y[0][i]Test_N_VScaleAddMultiVectorArray
: Case 4a: Y[j][i] = a[j] X[i] + Y[j][i]Test_N_VScaleAddMultiVectorArray
: Case 4b: Z[j][i] = a[j] X[i] + Y[j][i]Test_N_VLinearCombinationVectorArray
: Case 1a: x = a xTest_N_VLinearCombinationVectorArray
: Case 1b: z = a xTest_N_VLinearCombinationVectorArray
: Case 2a: x = a x + b yTest_N_VLinearCombinationVectorArray
: Case 2b: z = a x + b yTest_N_VLinearCombinationVectorArray
: Case 3a: x = a x + b y + c zTest_N_VLinearCombinationVectorArray
: Case 3b: w = a x + b y + c zTest_N_VLinearCombinationVectorArray
: Case 4a: X[0][i] = c[0] X[0][i]Test_N_VLinearCombinationVectorArray
: Case 4b: Z[i] = c[0] X[0][i]Test_N_VLinearCombinationVectorArray
: Case 5a: X[0][i] = c[0] X[0][i] + c[1] X[1][i]Test_N_VLinearCombinationVectorArray
: Case 5b: Z[i] = c[0] X[0][i] + c[1] X[1][i]Test_N_VLinearCombinationVectorArray
: Case 6a: X[0][i] = X[0][i] + c[1] X[1][i] + c[2] X[2][i]Test_N_VLinearCombinationVectorArray
: Case 6b: X[0][i] = c[0] X[0][i] + c[1] X[1][i] + c[2] X[2][i]Test_N_VLinearCombinationVectorArray
: Case 6c: Z[i] = c[0] X[0][i] + c[1] X[1][i] + c[2] X[2][i]Test_N_VDotProdLocal
: Calculate MPI tasklocal portion of the dot product of two vectors.Test_N_VMaxNormLocal
: Create vector with known values, find and validate the MPI tasklocal portion of the max norm.Test_N_VMinLocal
: Create vector, find and validate the MPI tasklocal min.Test_N_VL1NormLocal
: Create vector, find and validate the MPI tasklocal portion of the L1 norm.Test_N_VWSqrSumLocal
: Create vector of known values, find and validate the MPI tasklocal portion of the weighted squared sum of two vectors.Test_N_VWSqrSumMaskLocal
: Create vector of known values, find and validate the MPI tasklocal portion of the weighted squared sum of two vectors, using all elements except one.Test_N_VInvTestLocal
: Test the MPI tasklocal portion of z[i] = 1 / x[i]Test_N_VConstrMaskLocal
: Test the MPI tasklocal portion of the mask of vector x with vector c.Test_N_VMinQuotientLocal
: Fill two vectors with known values. Calculate and validate the MPI tasklocal minimum quotient.Test_N_VMBufSize
: Tests for accuracy in the reported buffer size.Test_N_VMBufPack
: Tests for accuracy in the buffer packing routine.Test_N_VMBufUnpack
: Tests for accuracy in the buffer unpacking routine.