# 9.9. The NVECTOR_SERIAL Module

The serial implementation of the NVECTOR module provided with SUNDIALS, NVECTOR_SERIAL, defines the content field of an N_Vector to be a structure containing the length of the vector, a pointer to the beginning of a contiguous data array, and a boolean flag own_data which specifies the ownership of data.

struct _N_VectorContent_Serial {
sunindextype length;
booleantype own_data;
realtype *data;
};


The header file to be included when using this module is nvector_serial.h. The installed module library to link to is libsundials_nvecserial.lib where .lib is typically .so for shared libraries and .a for static libraries.

## 9.9.1. NVECTOR_SERIAL accessor macros

The following five macros are provided to access the content of an NVECTOR_SERIAL vector. The suffix _S in the names denotes the serial version.

NV_CONTENT_S(v)

This macro gives access to the contents of the serial vector N_Vector v.

The assignment v_cont = NV_CONTENT_S(v) sets v_cont to be a pointer to the serial N_Vector content structure.

Implementation:

#define NV_CONTENT_S(v) ( (N_VectorContent_Serial)(v->content) )

NV_OWN_DATA_S(v)

Access the own_data component of the serial N_Vector v.

Implementation:

#define NV_OWN_DATA_S(v) ( NV_CONTENT_S(v)->own_data )

NV_DATA_S(v)

The assignment v_data = NV_DATA_S(v) sets v_data to be a pointer to the first component of the data for the N_Vector v.

Similarly, the assignment NV_DATA_S(v) = v_data sets the component array of v to be v_data by storing the pointer v_data.

Implementation:

#define NV_DATA_S(v) ( NV_CONTENT_S(v)->data )

NV_LENGTH_S(v)

Access the length component of the serial N_Vector v.

The assignment v_len = NV_LENGTH_S(v) sets v_len to be the length of v. On the other hand, the call NV_LENGTH_S(v) = len_v sets the length of v to be len_v.

Implementation:

#define NV_LENGTH_S(v) ( NV_CONTENT_S(v)->length )

NV_Ith_S(v, i)

This macro gives access to the individual components of the data array of an N_Vector, using standard 0-based C indexing.

The assignment r = NV_Ith_S(v,i) sets r to be the value of the i-th component of v.

The assignment NV_Ith_S(v,i) = r sets the value of the i-th component of v to be r.

Here i ranges from 0 to $$n-1$$ for a vector of length $$n$$.

Implementation:

#define NV_Ith_S(v,i) ( NV_DATA_S(v)[i] )


## 9.9.2. NVECTOR_SERIAL functions

The NVECTOR_SERIAL module defines serial implementations of all vector operations listed in §9.2.1, §9.2.2, §9.2.3, and §9.2.4. Their names are obtained from those in those sections by appending the suffix _Serial (e.g. N_VDestroy_Serial). All the standard vector operations listed in §9.2.1 with the suffix _Serial appended are callable via the Fortran 2003 interface by prepending an F (e.g. FN_VDestroy_Serial).

The module NVECTOR_SERIAL provides the following additional user-callable routines:

N_Vector N_VNew_Serial(sunindextype vec_length, SUNContext sunctx)

This function creates and allocates memory for a serial N_Vector. Its only argument is the vector length.

N_Vector N_VNewEmpty_Serial(sunindextype vec_length, SUNContext sunctx)

This function creates a new serial N_Vector with an empty (NULL) data array.

N_Vector N_VMake_Serial(sunindextype vec_length, realtype *v_data, SUNContext sunctx)

This function creates and allocates memory for a serial vector with user-provided data array, v_data.

(This function does not allocate memory for v_data itself.)

void N_VPrint_Serial(N_Vector v)

This function prints the content of a serial vector to stdout.

void N_VPrintFile_Serial(N_Vector v, FILE *outfile)

This function prints the content of a serial vector to outfile.

By default all fused and vector array operations are disabled in the NVECTOR_SERIAL module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_Serial(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees that the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned, from while vectors created with N_VNew_Serial() will have the default settings for the NVECTOR_SERIAL module.

int N_VEnableFusedOps_Serial(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombination_Serial(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableDotProdMulti_Serial(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearSumVectorArray_Serial(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableScaleVectorArray_Serial(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableConstVectorArray_Serial(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableWrmsNormVectorArray_Serial(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombinationVectorArray_Serial(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the serial vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• When looping over the components of an N_Vector v, it is more efficient to first obtain the component array via v_data = NV_DATA_S(v), or equivalently v_data = N_VGetArrayPointer(v), and then access v_data[i] within the loop than it is to use NV_Ith_S(v,i) within the loop.

• N_VNewEmpty_Serial(), N_VMake_Serial(), and N_VCloneVectorArrayEmpty_Serial() set the field own_data to SUNFALSE. The functions N_VDestroy_Serial() and N_VDestroyVectorArray_Serial() will not attempt to free the pointer data for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the data pointer.

• To maximize efficiency, vector operations in the NVECTOR_SERIAL implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same length.

## 9.9.3. NVECTOR_SERIAL Fortran Interface

The NVECTOR_SERIAL module provides a Fortran 2003 module for use from Fortran applications.

The fnvector_serial_mod Fortran module defines interfaces to all NVECTOR_SERIAL C functions using the intrinsic iso_c_binding module which provides a standardized mechanism for interoperating with C. As noted in the C function descriptions above, the interface functions are named after the corresponding C function, but with a leading F. For example, the function N_VNew_Serial is interfaced as FN_VNew_Serial.

The Fortran 2003 NVECTOR_SERIAL interface module can be accessed with the use statement, i.e. use fnvector_serial_mod, and linking to the library libsundials_fnvectorserial_mod.lib in addition to the C library. For details on where the library and module file fnvector_serial_mod.mod are installed see §14. We note that the module is accessible from the Fortran 2003 SUNDIALS integrators without separately linking to the libsundials_fnvectorserial_mod library.

# 9.10. The NVECTOR_PARALLEL Module

The NVECTOR_PARALLEL implementation of the NVECTOR module provided with SUNDIALS is based on MPI. It defines the content field of an N_Vector to be a structure containing the global and local lengths of the vector, a pointer to the beginning of a contiguous local data array, an MPI communicator, an a boolean flag own_data indicating ownership of the data array data.

struct _N_VectorContent_Parallel {
sunindextype local_length;
sunindextype global_length;
booleantype own_data;
realtype *data;
MPI_Comm comm;
};


The header file to be included when using this module is nvector_parallel.h. The installed module library to link to is libsundials_nvecparallel.lib where .lib is typically .so for shared libraries and .a for static libraries.

## 9.10.1. NVECTOR_PARALLEL accessor macros

The following seven macros are provided to access the content of a NVECTOR_PARALLEL vector. The suffix _P in the names denotes the distributed memory parallel version.

NV_CONTENT_P(v)

This macro gives access to the contents of the parallel N_Vector v.

The assignment v_cont = NV_CONTENT_P(v) sets v_cont to be a pointer to the N_Vector content structure of type struct N_VectorContent_Parallel.

Implementation:

#define NV_CONTENT_P(v) ( (N_VectorContent_Parallel)(v->content) )

NV_OWN_DATA_P(v)

Access the own_data component of the parallel N_Vector v.

Implementation:

#define NV_OWN_DATA_P(v)   ( NV_CONTENT_P(v)->own_data )

NV_DATA_P(v)

The assignment v_data = NV_DATA_P(v) sets v_data to be a pointer to the first component of the local_data for the N_Vector v.

The assignment NV_DATA_P(v) = v_data sets the component array of v to be v_data by storing the pointer v_data into data.

Implementation:

#define NV_DATA_P(v)       ( NV_CONTENT_P(v)->data )

NV_LOCLENGTH_P(v)

The assignment v_llen = NV_LOCLENGTH_P(v) sets v_llen to be the length of the local part of v.

The call NV_LOCLENGTH_P(v) = llen_v sets the local_length of v to be llen_v.

Implementation:

#define NV_LOCLENGTH_P(v)  ( NV_CONTENT_P(v)->local_length )

NV_GLOBLENGTH_P(v)

The assignment v_glen = NV_GLOBLENGTH_P(v) sets v_glen to be the global_length of the vector v.

The call NV_GLOBLENGTH_P(v) = glen_v sets the global_length of v to be glen_v.

Implementation:

#define NV_GLOBLENGTH_P(v) ( NV_CONTENT_P(v)->global_length )

NV_COMM_P(v)

This macro provides access to the MPI communicator used by the parallel N_Vector v.

Implementation:

#define NV_COMM_P(v) ( NV_CONTENT_P(v)->comm )

NV_Ith_P(v, i)

This macro gives access to the individual components of the local_data array of an N_Vector.

The assignment r = NV_Ith_P(v,i) sets r to be the value of the i-th component of the local part of v.

The assignment NV_Ith_P(v,i) = r sets the value of the i-th component of the local part of v to be r.

Here i ranges from 0 to $$n-1$$, where $$n$$ is the local_length.

Implementation:

#define NV_Ith_P(v,i) ( NV_DATA_P(v)[i] )


## 9.10.2. NVECTOR_PARALLEL functions

The NVECTOR_PARALLEL module defines parallel implementations of all vector operations listed in §9.2. Their names are obtained from the generic names by appending the suffix _Parallel (e.g. N_VDestroy_Parallel). The module NVECTOR_PARALLEL provides the following additional user-callable routines:

N_Vector N_VNew_Parallel(MPI_Comm comm, sunindextype local_length, sunindextype global_length, SUNContext sunctx)

This function creates and allocates memory for a parallel vector having global length global_length, having processor-local length local_length, and using the MPI communicator comm.

N_Vector N_VNewEmpty_Parallel(MPI_Comm comm, sunindextype local_length, sunindextype global_length, SUNContext sunctx)

This function creates a new parallel N_Vector with an empty (NULL) data array.

N_Vector N_VMake_Parallel(MPI_Comm comm, sunindextype local_length, sunindextype global_length, realtype *v_data, SUNContext sunctx)

This function creates and allocates memory for a parallel vector with user-provided data array.

(This function does not allocate memory for v_data itself.)

sunindextype N_VGetLocalLength_Parallel(N_Vector v)

This function returns the local vector length.

void N_VPrint_Parallel(N_Vector v)

This function prints the local content of a parallel vector to stdout.

void N_VPrintFile_Parallel(N_Vector v, FILE *outfile)

This function prints the local content of a parallel vector to outfile.

By default all fused and vector array operations are disabled in the NVECTOR_PARALLEL module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_Parallel(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees that the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from, while vectors created with N_VNew_Parallel() will have the default settings for the NVECTOR_PARALLEL module.

int N_VEnableFusedOps_Parallel(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombination_Parallel(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableDotProdMulti_Parallel(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearSumVectorArray_Parallel(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableScaleVectorArray_Parallel(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableConstVectorArray_Parallel(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableWrmsNormVectorArray_Parallel(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombinationVectorArray_Parallel(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the parallel vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• When looping over the components of an N_Vector v, it is more efficient to first obtain the local component array via v_data = N_VGetArrayPointer(v), or equivalently v_data = NV_DATA_P(v), and then access v_data[i] within the loop than it is to use NV_Ith_P(v,i) within the loop.

• N_VNewEmpty_Parallel(), N_VMake_Parallel(), and N_VCloneVectorArrayEmpty_Parallel() set the field own_data to SUNFALSE. The routines N_VDestroy_Parallel() and N_VDestroyVectorArray_Parallel() will not attempt to free the pointer data for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the data pointer.

• To maximize efficiency, vector operations in the NVECTOR_PARALLEL implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations.

## 9.10.3. NVECTOR_PARALLEL Fortran Interface

The NVECTOR_PARALLEL module provides a Fortran 2003 module for use from Fortran applications.

The fnvector_parallel_mod Fortran module defines interfaces to all NVECTOR_PARALLEL C functions using the intrinsic iso_c_binding module which provides a standardized mechanism for interoperating with C. As noted in the C function descriptions above, the interface functions are named after the corresponding C function, but with a leading F. For example, the function N_VNew_Parallel is interfaced as FN_VNew_Parallel.

The Fortran 2003 NVECTOR_PARALLEL interface module can be accessed with the use statement, i.e. use fnvector_parallel_mod, and linking to the library libsundials_fnvectorparallel_mod.lib in addition to the C library. For details on where the library and module file fnvector_parallel_mod.mod are installed see §14. We note that the module is accessible from the Fortran 2003 SUNDIALS integrators without separately linking to the libsundials_fnvectorparallel_mod library.

# 9.11. The NVECTOR_OPENMP Module

In situations where a user has a multi-core processing unit capable of running multiple parallel threads with shared memory, SUNDIALS provides an implementation of NVECTOR using OpenMP, called NVECTOR_OPENMP, and an implementation using Pthreads, called NVECTOR_PTHREADS. Testing has shown that vectors should be of length at least $$100,000$$ before the overhead associated with creating and using the threads is made up by the parallelism in the vector calculations.

The OpenMP NVECTOR implementation provided with SUNDIALS, NVECTOR_OPENMP, defines the content field of N_Vector to be a structure containing the length of the vector, a pointer to the beginning of a contiguous data array, a boolean flag own_data which specifies the ownership of data, and the number of threads. Operations on the vector are threaded using OpenMP, the number of threads used is based on the supplied argument in the vector constructor.

struct _N_VectorContent_OpenMP {
sunindextype length;
booleantype own_data;
realtype *data;
};


The header file to be included when using this module is nvector_openmp.h. The installed module library to link to is libsundials_nvecopenmp.lib where .lib is typically .so for shared libraries and .a for static libraries. The Fortran module file to use when using the Fortran 2003 interface to this module is fnvector_openmp_mod.mod.

## 9.11.1. NVECTOR_OPENMP accessor macros

The following six macros are provided to access the content of an NVECTOR_OPENMP vector. The suffix _OMP in the names denotes the OpenMP version.

NV_CONTENT_OMP(v)

This macro gives access to the contents of the OpenMP vector N_Vector v.

The assignment v_cont = NV_CONTENT_OMP(v) sets v_cont to be a pointer to the OpenMP N_Vector content structure.

Implementation:

#define NV_CONTENT_OMP(v) ( (N_VectorContent_OpenMP)(v->content) )

NV_OWN_DATA_OMP(v)

Access the own_data component of the OpenMP N_Vector v.

Implementation:

#define NV_OWN_DATA_OMP(v) ( NV_CONTENT_OMP(v)->own_data )

NV_DATA_OMP(v)

The assignment v_data = NV_DATA_OMP(v) sets v_data to be a pointer to the first component of the data for the N_Vector v.

Similarly, the assignment NV_DATA_OMP(v) = v_data sets the component array of v to be v_data by storing the pointer v_data.

Implementation:

#define NV_DATA_OMP(v) ( NV_CONTENT_OMP(v)->data )

NV_LENGTH_OMP(v)

Access the length component of the OpenMP N_Vector v.

The assignment v_len = NV_LENGTH_OMP(v) sets v_len to be the length of v. On the other hand, the call NV_LENGTH_OMP(v) = len_v sets the length of v to be len_v.

Implementation:

#define NV_LENGTH_OMP(v) ( NV_CONTENT_OMP(v)->length )


Access the num_threads component of the OpenMP N_Vector v.

The assignment v_threads = NV_NUM_THREADS_OMP(v) sets v_threads to be the num_threads of v. On the other hand, the call NV_NUM_THREADS_OMP(v) = num_threads_v sets the num_threads of v to be num_threads_v.

Implementation:

#define NV_NUM_THREADS_OMP(v) ( NV_CONTENT_OMP(v)->num_threads )

NV_Ith_OMP(v, i)

This macro gives access to the individual components of the data array of an N_Vector, using standard 0-based C indexing.

The assignment r = NV_Ith_OMP(v,i) sets r to be the value of the i-th component of v.

The assignment NV_Ith_OMP(v,i) = r sets the value of the i-th component of v to be r.

Here i ranges from 0 to $$n-1$$ for a vector of length $$n$$.

Implementation:

#define NV_Ith_OMP(v,i) ( NV_DATA_OMP(v)[i] )


## 9.11.2. NVECTOR_OPENMP functions

The NVECTOR_OPENMP module defines OpenMP implementations of all vector operations listed in §9.2, §9.2.2, §9.2.3, and §9.2.4. Their names are obtained from those in those sections by appending the suffix _OpenMP (e.g. N_VDestroy_OpenMP). All the standard vector operations listed in §9.2 with the suffix _OpenMP appended are callable via the Fortran 2003 interface by prepending an F’ (e.g. FN_VDestroy_OpenMP).

The module NVECTOR_OPENMP provides the following additional user-callable routines:

N_Vector N_VNew_OpenMP(sunindextype vec_length, int num_threads, SUNContext sunctx)

This function creates and allocates memory for a OpenMP N_Vector. Arguments are the vector length and number of threads.

N_Vector N_VNewEmpty_OpenMP(sunindextype vec_length, int num_threads, SUNContext sunctx)

This function creates a new OpenMP N_Vector with an empty (NULL) data array.

N_Vector N_VMake_OpenMP(sunindextype vec_length, realtype *v_data, int num_threads, SUNContext sunctx)

This function creates and allocates memory for a OpenMP vector with user-provided data array, v_data.

(This function does not allocate memory for v_data itself.)

void N_VPrint_OpenMP(N_Vector v)

This function prints the content of an OpenMP vector to stdout.

void N_VPrintFile_OpenMP(N_Vector v, FILE *outfile)

This function prints the content of an OpenMP vector to outfile.

By default all fused and vector array operations are disabled in the NVECTOR_OPENMP module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_OpenMP(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VNew_OpenMP() will have the default settings for the NVECTOR_OPENMP module.

int N_VEnableFusedOps_OpenMP(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombination_OpenMP(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableDotProdMulti_OpenMP(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearSumVectorArray_OpenMP(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableScaleVectorArray_OpenMP(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableConstVectorArray_OpenMP(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableWrmsNormVectorArray_OpenMP(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombinationVectorArray_OpenMP(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the OpenMP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• When looping over the components of an N_Vector v, it is more efficient to first obtain the component array via v_data = N_VGetArrayPointer(v), or equivalently v_data = NV_DATA_OMP(v) and then access v_data[i] within the loop than it is to use NV_Ith_OMP(v,i) within the loop.

• N_VNewEmpty_OpenMP(), N_VMake_OpenMP(), and N_VCloneVectorArrayEmpty_OpenMP() set the field own_data to SUNFALSE. The functions N_VDestroy_OpenMP() and N_VDestroyVectorArray_OpenMP() will not attempt to free the pointer data for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the data pointer.

• To maximize efficiency, vector operations in the NVECTOR_OPENMP implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations.

## 9.11.3. NVECTOR_OPENMP Fortran Interface

The NVECTOR_OPENMP module provides a Fortran 2003 module for use from Fortran applications.

The fnvector_openmp_mod Fortran module defines interfaces to all NVECTOR_OPENMP C functions using the intrinsic iso_c_binding module which provides a standardized mechanism for interoperating with C. As noted in the C function descriptions above, the interface functions are named after the corresponding C function, but with a leading F. For example, the function N_VNew_OpenMP is interfaced as FN_VNew_OpenMP.

The Fortran 2003 NVECTOR_OPENMP interface module can be accessed with the use statement, i.e. use fnvector_openmp_mod, and linking to the library libsundials_fnvectoropenmp_mod.lib in addition to the C library. For details on where the library and module file fnvector_openmp_mod.mod are installed see §14.

In situations where a user has a multi-core processing unit capable of running multiple parallel threads with shared memory, SUNDIALS provides an implementation of NVECTOR using OpenMP, called NVECTOR_OPENMP, and an implementation using Pthreads, called NVECTOR_PTHREADS. Testing has shown that vectors should be of length at least $$100,000$$ before the overhead associated with creating and using the threads is made up by the parallelism in the vector calculations.

The Pthreads NVECTOR implementation provided with SUNDIALS, denoted NVECTOR_PTHREADS, defines the content field of N_Vector to be a structure containing the length of the vector, a pointer to the beginning of a contiguous data array, a boolean flag own_data which specifies the ownership of data, and the number of threads. Operations on the vector are threaded using POSIX threads (Pthreads).

struct _N_VectorContent_Pthreads {
sunindextype length;
booleantype own_data;
realtype *data;
};


The header file to be included when using this module is nvector_pthreads.h. The installed module library to link to is libsundials_nvecpthreads.lib where .lib is typically .so for shared libraries and .a for static libraries.

The following six macros are provided to access the content of an NVECTOR_PTHREADS vector. The suffix _PT in the names denotes the Pthreads version.

NV_CONTENT_PT(v)

This macro gives access to the contents of the Pthreads vector N_Vector v.

The assignment v_cont = NV_CONTENT_PT(v) sets v_cont to be a pointer to the Pthreads N_Vector content structure.

Implementation:

#define NV_CONTENT_PT(v) ( (N_VectorContent_Pthreads)(v->content) )

NV_OWN_DATA_PT(v)

Access the own_data component of the Pthreads N_Vector v.

Implementation:

#define NV_OWN_DATA_PT(v) ( NV_CONTENT_PT(v)->own_data )

NV_DATA_PT(v)

The assignment v_data = NV_DATA_PT(v) sets v_data to be a pointer to the first component of the data for the N_Vector v.

Similarly, the assignment NV_DATA_PT(v) = v_data sets the component array of v to be v_data by storing the pointer v_data.

Implementation:

#define NV_DATA_PT(v) ( NV_CONTENT_PT(v)->data )

NV_LENGTH_PT(v)

Access the length component of the Pthreads N_Vector v.

The assignment v_len = NV_LENGTH_PT(v) sets v_len to be the length of v. On the other hand, the call NV_LENGTH_PT(v) = len_v sets the length of v to be len_v.

Implementation:

#define NV_LENGTH_PT(v) ( NV_CONTENT_PT(v)->length )


Access the num_threads component of the Pthreads N_Vector v.

The assignment v_threads = NV_NUM_THREADS_PT(v) sets v_threads to be the num_threads of v. On the other hand, the call NV_NUM_THREADS_PT(v) = num_threads_v sets the num_threads of v to be num_threads_v.

Implementation:

#define NV_NUM_THREADS_PT(v) ( NV_CONTENT_PT(v)->num_threads )

NV_Ith_PT(v, i)

This macro gives access to the individual components of the data array of an N_Vector, using standard 0-based C indexing.

The assignment r = NV_Ith_PT(v,i) sets r to be the value of the i-th component of v.

The assignment NV_Ith_PT(v,i) = r sets the value of the i-th component of v to be r.

Here i ranges from 0 to $$n-1$$ for a vector of length $$n$$.

Implementation:

#define NV_Ith_PT(v,i) ( NV_DATA_PT(v)[i] )


The NVECTOR_PTHREADS module defines Pthreads implementations of all vector operations listed in §9.2, §9.2.2, §9.2.3, and §9.2.4. Their names are obtained from those in those sections by appending the suffix _Pthreads (e.g. N_VDestroy_Pthreads). All the standard vector operations listed in §9.2 are callable via the Fortran 2003 interface by prepending an F’ (e.g. FN_VDestroy_Pthreads). The module NVECTOR_PTHREADS provides the following additional user-callable routines:

This function creates and allocates memory for a Pthreads N_Vector. Arguments are the vector length and number of threads.

This function creates a new Pthreads N_Vector with an empty (NULL) data array.

This function creates and allocates memory for a Pthreads vector with user-provided data array, v_data.

(This function does not allocate memory for v_data itself.)

This function prints the content of a Pthreads vector to stdout.

This function prints the content of a Pthreads vector to outfile.

By default all fused and vector array operations are disabled in the NVECTOR_PTHREADS module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_Pthreads(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VNew_Pthreads() will have the default settings for the NVECTOR_PTHREADS module.

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the Pthreads vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• When looping over the components of an N_Vector v, it is more efficient to first obtain the component array via v_data = N_VGetArrayPointer(v), or equivalently v_data = NV_DATA_PT(v) and then access v_data[i] within the loop than it is to use NV_Ith_S(v,i) within the loop.

• N_VNewEmpty_Pthreads(), N_VMake_Pthreads(), and N_VCloneVectorArrayEmpty_Pthreads() set the field own_data to SUNFALSE. The functions N_VDestroy_Pthreads() and N_VDestroyVectorArray_Pthreads() will not attempt to free the pointer data for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the data pointer.

• To maximize efficiency, vector operations in the NVECTOR_PTHREADS implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations.

The NVECTOR_PTHREADS module provides a Fortran 2003 module for use from Fortran applications.

The fnvector_pthreads_mod Fortran module defines interfaces to all NVECTOR_PTHREADS C functions using the intrinsic iso_c_binding module which provides a standardized mechanism for interoperating with C. As noted in the C function descriptions above, the interface functions are named after the corresponding C function, but with a leading F. For example, the function N_VNew_Pthreads is interfaced as FN_VNew_Pthreads.

The Fortran 2003 NVECTOR_PTHREADS interface module can be accessed with the use statement, i.e. use fnvector_pthreads_mod, and linking to the library libsundials_fnvectorpthreads_mod.lib in addition to the C library. For details on where the library and module file fnvector_pthreads_mod.mod are installed see §14.

# 9.13. The NVECTOR_PARHYP Module

The NVECTOR_PARHYP implementation of the NVECTOR module provided with SUNDIALS is a wrapper around HYPRE’s ParVector class. Most of the vector kernels simply call HYPRE vector operations. The implementation defines the content field of N_Vector to be a structure containing the global and local lengths of the vector, a pointer to an object of type hypre_ParVector, an MPI communicator, and a boolean flag own_parvector indicating ownership of the HYPRE parallel vector object x.

struct _N_VectorContent_ParHyp {
sunindextype local_length;
sunindextype global_length;
booleantype own_data;
booleantype own_parvector;
realtype *data;
MPI_Comm comm;
hypre_ParVector *x;
};


The header file to be included when using this module is nvector_parhyp.h. The installed module library to link to is libsundials_nvecparhyp.lib where .lib is typically .so for shared libraries and .a for static libraries.

Unlike native SUNDIALS vector types, NVECTOR_PARHYP does not provide macros to access its member variables. Note that NVECTOR_PARHYP requires SUNDIALS to be built with MPI support.

## 9.13.1. NVECTOR_PARHYP functions

The NVECTOR_PARHYP module defines implementations of all vector operations listed in §9.2 except for N_VSetArrayPointer() and N_VGetArrayPointer() because accessing raw vector data is handled by low-level HYPRE functions. As such, this vector is not available for use with SUNDIALS Fortran interfaces. When access to raw vector data is needed, one should extract the HYPRE vector first, and then use HYPRE methods to access the data. Usage examples of NVECTOR_PARHYP are provided in the cvAdvDiff_non_ph.c example programs for CVODE and the ark_diurnal_kry_ph.c example program for ARKODE.

The names of parhyp methods are obtained from those in §9.2, §9.2.2, §9.2.3, and §9.2.4 by appending the suffix _ParHyp (e.g. N_VDestroy_ParHyp). The module NVECTOR_PARHYP provides the following additional user-callable routines:

N_Vector N_VNewEmpty_ParHyp(MPI_Comm comm, sunindextype local_length, sunindextype global_length, SUNContext sunctx)

This function creates a new parhyp N_Vector with the pointer to the HYPRE vector set to NULL.

N_Vector N_VMake_ParHyp(hypre_ParVector *x, SUNContext sunctx)

This function creates an N_Vector wrapper around an existing HYPRE parallel vector. It does not allocate memory for x itself.

hypre_ParVector *N_VGetVector_ParHyp(N_Vector v)

This function returns a pointer to the underlying HYPRE vector.

void N_VPrint_ParHyp(N_Vector v)

This function prints the local content of a parhyp vector to stdout.

void N_VPrintFile_ParHyp(N_Vector v, FILE *outfile)

This function prints the local content of a parhyp vector to outfile.

By default all fused and vector array operations are disabled in the NVECTOR_PARHYP module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VMake_ParHyp(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VMake_ParHyp() will have the default settings for the NVECTOR_PARHYP module.

int N_VEnableFusedOps_ParHyp(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombination_ParHyp(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableDotProdMulti_ParHyp(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearSumVectorArray_ParHyp(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableScaleVectorArray_ParHyp(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableConstVectorArray_ParHyp(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableWrmsNormVectorArray_ParHyp(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombinationVectorArray_ParHyp(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the parhyp vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• When there is a need to access components of an N_Vector_ParHyp v, it is recommended to extract the HYPRE vector via x_vec = N_VGetVector_ParHyp(v) and then access components using appropriate HYPRE functions.

• N_VNewEmpty_ParHyp(), N_VMake_ParHyp(), and N_VCloneVectorArrayEmpty_ParHyp() set the field own_parvector to SUNFALSE. The functions N_VDestroy_ParHyp() and N_VDestroyVectorArray_ParHyp() will not attempt to delete an underlying HYPRE vector for any N_Vector with own_parvector set to SUNFALSE. In such a case, it is the user’s responsibility to delete the underlying vector.

• To maximize efficiency, vector operations in the NVECTOR_PARHYP implementation that have more than one N_Vector argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations.

# 9.14. The NVECTOR_PETSC Module

The NVECTOR_PETSC module is an NVECTOR wrapper around the PETSc vector. It defines the content field of a N_Vector to be a structure containing the global and local lengths of the vector, a pointer to the PETSc vector, an MPI communicator, and a boolean flag own_data indicating ownership of the wrapped PETSc vector.

struct _N_VectorContent_Petsc {
sunindextype local_length;
sunindextype global_length;
booleantype own_data;
Vec *pvec;
MPI_Comm comm;
};


The header file to be included when using this module is nvector_petsc.h. The installed module library to link to is libsundials_nvecpetsc.lib where .lib is typically .so for shared libraries and .a for static libraries.

Unlike native SUNDIALS vector types, NVECTOR_PETSC does not provide macros to access its member variables. Note that NVECTOR_PETSC requires SUNDIALS to be built with MPI support.

## 9.14.1. NVECTOR_PETSC functions

The NVECTOR_PETSC module defines implementations of all vector operations listed in §9.2 except for N_VGetArrayPointer() and N_VSetArrayPointer(). As such, this vector cannot be used with SUNDIALS Fortran interfaces. When access to raw vector data is needed, it is recommended to extract the PETSc vector first, and then use PETSc methods to access the data. Usage examples of NVECTOR_PETSC is provided in example programs for IDA.

The names of vector operations are obtained from those in §9.2, §9.2.2, §9.2.3, and §9.2.4 by appending the suffice _Petsc (e.g. N_VDestroy_Petsc). The module NVECTOR_PETSC provides the following additional user-callable routines:

N_Vector N_VNewEmpty_Petsc(MPI_Comm comm, sunindextype local_length, sunindextype global_length, SUNContext sunctx)

This function creates a new PETSC N_Vector with the pointer to the wrapped PETSc vector set to NULL. It is used by the N_VMake_Petsc and N_VClone_Petsc implementations. It should be used only with great caution.

N_Vector N_VMake_Petsc(Vec *pvec, SUNContext sunctx)

This function creates and allocates memory for an NVECTOR_PETSC wrapper with a user-provided PETSc vector. It does not allocate memory for the vector pvec itself.

Vec *N_VGetVector_Petsc(N_Vector v)

This function returns a pointer to the underlying PETSc vector.

void N_VPrint_Petsc(N_Vector v)

This function prints the global content of a wrapped PETSc vector to stdout.

void N_VPrintFile_Petsc(N_Vector v, const char fname[])

This function prints the global content of a wrapped PETSc vector to fname.

By default all fused and vector array operations are disabled in the NVECTOR_PETSC module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VMake_Petsc(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VMake_Petsc() will have the default settings for the NVECTOR_PETSC module.

int N_VEnableFusedOps_Petsc(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombination_Petsc(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableDotProdMulti_Petsc(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearSumVectorArray_Petsc(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableScaleVectorArray_Petsc(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableConstVectorArray_Petsc(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableWrmsNormVectorArray_Petsc(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombinationVectorArray_Petsc(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the PETSc vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• When there is a need to access components of an N_Vector_Petsc v, it is recommeded to extract the PETSc vector via x_vec = N_VGetVector_Petsc(v); and then access components using appropriate PETSc functions.

• The functions N_VNewEmpty_Petsc(), N_VMake_Petsc(), and N_VCloneVectorArrayEmpty_Petsc() set the field own_data to SUNFALSE. The routines N_VDestroy_Petsc() and N_VDestroyVectorArray_Petsc() will not attempt to free the pointer pvec for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the pvec pointer.

• To maximize efficiency, vector operations in the NVECTOR_PETSC implementation that have more than one N_Vector argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations.

# 9.15. The NVECTOR_CUDA Module

The NVECTOR_CUDA module is an NVECTOR implementation in the CUDA language. The module allows for SUNDIALS vector kernels to run on NVIDIA GPU devices. It is intended for users who are already familiar with CUDA and GPU programming. Building this vector module requires a CUDA compiler and, by extension, a C++ compiler. The vector content layout is as follows:

struct _N_VectorContent_Cuda
{
sunindextype       length;
booleantype        own_helper;
SUNMemory          host_data;
SUNMemory          device_data;
SUNCudaExecPolicy* stream_exec_policy;
SUNCudaExecPolicy* reduce_exec_policy;
SUNMemoryHelper    mem_helper;
void*              priv; /* 'private' data */
};

typedef struct _N_VectorContent_Cuda *N_VectorContent_Cuda;


The content members are the vector length (size), boolean flags that indicate if the vector owns the execution policies and memory helper objects (i.e., it is in change of freeing the objects), SUNMemory objects for the vector data on the host and device, pointers to execution policies that control how streaming and reduction kernels are launched, a SUNMemoryHelper for performing memory operations, and a private data structure which holds additonal members that should not be accessed directly.

When instantiated with N_VNew_Cuda(), the underlying data will be allocated on both the host and the device. Alternatively, a user can provide host and device data arrays by using the N_VMake_Cuda() constructor. To use CUDA managed memory, the constructors N_VNewManaged_Cuda() and N_VMakeManaged_Cuda() are provided. Additionally, a user-defined SUNMemoryHelper for allocating/freeing data can be provided with the constructor N_VNewWithMemHelp_Cuda(). Details on each of these constructors are provided below.

To use the NVECTOR_CUDA module, include nvector_cuda.h and link to the library libsundials_nveccuda.lib. The extension, .lib, is typically .so for shared libraries and .a for static libraries.

## 9.15.1. NVECTOR_CUDA functions

Unlike other native SUNDIALS vector types, the NVECTOR_CUDA module does not provide macros to access its member variables. Instead, user should use the accessor functions:

realtype *N_VGetHostArrayPointer_Cuda(N_Vector v)

This function returns pointer to the vector data on the host.

realtype *N_VGetDeviceArrayPointer_Cuda(N_Vector v)

This function returns pointer to the vector data on the device.

booleantype N_VIsManagedMemory_Cuda(N_Vector v)

This function returns a boolean flag indiciating if the vector data array is in managed memory or not.

The NVECTOR_CUDA module defines implementations of all standard vector operations defined in §9.2, §9.2.2, §9.2.3, and §9.2.4, except for N_VSetArrayPointer(), and, if using unmanaged memory, N_VGetArrayPointer(). As such, this vector can only be used with SUNDIALS direct solvers and preconditioners when using managed memory. The NVECTOR_CUDA module provides separate functions to access data on the host and on the device for the unmanaged memory use case. It also provides methods for copying from the host to the device and vice versa. Usage examples of NVECTOR_CUDA are provided in example programs for CVODE [64].

The names of vector operations are obtained from those in §9.2, §9.2.2, §9.2.3, and §9.2.4 by appending the suffix _Cuda (e.g. N_VDestroy_Cuda). The module NVECTOR_CUDA provides the following additional user-callable routines:

N_Vector N_VNew_Cuda(sunindextype length, SUNContext sunctx)

This function creates and allocates memory for a CUDA N_Vector. The vector data array is allocated on both the host and device.

N_Vector N_VNewManaged_Cuda(sunindextype vec_length, SUNContext sunctx)

This function creates and allocates memory for a CUDA N_Vector. The vector data array is allocated in managed memory.

N_Vector N_VNewWithMemHelp_Cuda(sunindextype length, booleantype use_managed_mem, SUNMemoryHelper helper, SUNContext sunctx)

This function creates a new CUDA N_Vector with a user-supplied SUNMemoryHelper for allocating/freeing memory.

N_Vector N_VNewEmpty_Cuda(sunindextype vec_length, SUNContext sunctx)

This function creates a new CUDA N_Vector where the members of the content structure have not been allocated. This utility function is used by the other constructors to create a new vector.

N_Vector N_VMake_Cuda(sunindextype vec_length, realtype *h_vdata, realtype *d_vdata, SUNContext sunctx)

This function creates a CUDA N_Vector with user-supplied vector data arrays for the host and the device.

N_Vector N_VMakeManaged_Cuda(sunindextype vec_length, realtype *vdata, SUNContext sunctx)

This function creates a CUDA N_Vector with a user-supplied managed memory data array.

N_Vector N_VMakeWithManagedAllocator_Cuda(sunindextype length, void *(*allocfn)(size_t size), void (*freefn)(void *ptr))

This function creates a CUDA N_Vector with a user-supplied memory allocator. It requires the user to provide a corresponding free function as well. The memory allocated by the allocator function must behave like CUDA managed memory.

The module NVECTOR_CUDA also provides the following user-callable routines:

void N_VSetKernelExecPolicy_Cuda(N_Vector v, SUNCudaExecPolicy *stream_exec_policy, SUNCudaExecPolicy *reduce_exec_policy)

This function sets the execution policies which control the kernel parameters utilized when launching the streaming and reduction CUDA kernels. By default the vector is setup to use the SUNCudaThreadDirectExecPolicy() and SUNCudaBlockReduceAtomicExecPolicy(). Any custom execution policy for reductions must ensure that the grid dimensions (number of thread blocks) is a multiple of the CUDA warp size (32). See §9.15.2 below for more information about the SUNCudaExecPolicy class. Providing NULL for an argument will result in the default policy being restored.

The input execution policies are cloned and, as such, may be freed after being attached to the desired vectors. A NULL input policy will reset the execution policy to the default setting.

Note

Note: All vectors used in a single instance of a SUNDIALS package must use the same execution policy. It is strongly recommended that this function is called immediately after constructing the vector, and any subsequent vector be created by cloning to ensure consistent execution policies across vectors

realtype *N_VCopyToDevice_Cuda(N_Vector v)

This function copies host vector data to the device.

realtype *N_VCopyFromDevice_Cuda(N_Vector v)

This function copies vector data from the device to the host.

void N_VPrint_Cuda(N_Vector v)

This function prints the content of a CUDA vector to stdout.

void N_VPrintFile_Cuda(N_Vector v, FILE *outfile)

This function prints the content of a CUDA vector to outfile.

By default all fused and vector array operations are disabled in the NVECTOR_CUDA module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_Cuda(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VNew_Cuda() will have the default settings for the NVECTOR_CUDA module.

int N_VEnableFusedOps_Cuda(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombination_Cuda(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableDotProdMulti_Cuda(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearSumVectorArray_Cuda(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableScaleVectorArray_Cuda(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableConstVectorArray_Cuda(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableWrmsNormVectorArray_Cuda(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombinationVectorArray_Cuda(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the CUDA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• When there is a need to access components of an N_Vector_Cuda, v, it is recommeded to use functions N_VGetDeviceArrayPointer_Cuda() or N_VGetHostArrayPointer_Cuda(). However, when using managed memory, the function N_VGetArrayPointer() may also be used.

• To maximize efficiency, vector operations in the NVECTOR_CUDA implementation that have more than one N_Vector argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations.

## 9.15.2. The SUNCudaExecPolicy Class

In order to provide maximum flexibility to users, the CUDA kernel execution parameters used by kernels within SUNDIALS are defined by objects of the sundials::cuda::ExecPolicy abstract class type (this class can be accessed in the global namespace as SUNCudaExecPolicy). Thus, users may provide custom execution policies that fit the needs of their problem. The SUNCudaExecPolicy class is defined as

typedef sundials::cuda::ExecPolicy SUNCudaExecPolicy

where the sundials::cuda::ExecPolicy class is defined in the header file sundials_cuda_policies.hpp, as follows:

class ExecPolicy
{
public:
ExecPolicy(cudaStream_t stream = 0) : stream_(stream) { }
virtual size_t gridSize(size_t numWorkUnits = 0, size_t blockDim = 0) const = 0;
virtual size_t blockSize(size_t numWorkUnits = 0, size_t gridDim = 0) const = 0;
virtual const cudaStream_t* stream() const { return (&stream_); }
virtual ExecPolicy* clone() const = 0;
ExecPolicy* clone_new_stream(cudaStream_t stream) const {
ExecPolicy* ex = clone();
ex->stream_ = stream;
return ex;
}
virtual bool atomic() const { return false; }
virtual ~ExecPolicy() {}
protected:
cudaStream_t stream_;
};


To define a custom execution policy, a user simply needs to create a class that inherits from the abstract class and implements the methods. The SUNDIALS provided sundials::cuda::ThreadDirectExecPolicy (aka in the global namespace as SUNCudaThreadDirectExecPolicy) class is a good example of a what a custom execution policy may look like:

class ThreadDirectExecPolicy : public ExecPolicy
{
public:
ThreadDirectExecPolicy(const size_t blockDim, cudaStream_t stream = 0)
: blockDim_(blockDim), ExecPolicy(stream)
{}

: blockDim_(ex.blockDim_), ExecPolicy(ex.stream_)
{}

virtual size_t gridSize(size_t numWorkUnits = 0, size_t /*blockDim*/ = 0) const
{
/* ceil(n/m) = floor((n + m - 1) / m) */
return (numWorkUnits + blockSize() - 1) / blockSize();
}

virtual size_t blockSize(size_t /*numWorkUnits*/ = 0, size_t /*gridDim*/ = 0) const
{
return blockDim_;
}

virtual ExecPolicy* clone() const
{
}

private:
const size_t blockDim_;
};


In total, SUNDIALS provides 3 execution policies:

SUNCudaThreadDirectExecPolicy(const size_t blockDim, const cudaStream_t stream = 0)

Maps each CUDA thread to a work unit. The number of threads per block (blockDim) can be set to anything. The grid size will be calculated so that there are enough threads for one thread per element. If a CUDA stream is provided, it will be used to execute the kernel.

SUNCudaGridStrideExecPolicy(const size_t blockDim, const size_t gridDim, const cudaStream_t stream = 0)

Is for kernels that use grid stride loops. The number of threads per block (blockDim) can be set to anything. The number of blocks (gridDim) can be set to anything. If a CUDA stream is provided, it will be used to execute the kernel.

SUNCudaBlockReduceExecPolicy(const size_t blockDim, const cudaStream_t stream = 0)

Is for kernels performing a reduction across indvidual thread blocks. The number of threads per block (blockDim) can be set to any valid multiple of the CUDA warp size. The grid size (gridDim) can be set to any value greater than 0. If it is set to 0, then the grid size will be chosen so that there is enough threads for one thread per work unit. If a CUDA stream is provided, it will be used to execute the kernel.

SUNCudaBlockReduceAtomicExecPolicy(const size_t blockDim, const cudaStream_t stream = 0)

Is for kernels performing a reduction across indvidual thread blocks using atomic operations. The number of threads per block (blockDim) can be set to any valid multiple of the CUDA warp size. The grid size (gridDim) can be set to any value greater than 0. If it is set to 0, then the grid size will be chosen so that there is enough threads for one thread per work unit. If a CUDA stream is provided, it will be used to execute the kernel.

For example, a policy that uses 128 threads per block and a user provided stream can be created like so:

cudaStream_t stream;
cudaStreamCreate(&stream);


These default policy objects can be reused for multiple SUNDIALS data structures (e.g. a SUNMatrix and an N_Vector) since they do not hold any modifiable state information.

# 9.16. The NVECTOR_HIP Module

The NVECTOR_HIP module is an NVECTOR implementation using the AMD ROCm HIP library [2]. The module allows for SUNDIALS vector kernels to run on AMD or NVIDIA GPU devices. It is intended for users who are already familiar with HIP and GPU programming. Building this vector module requires the HIP-clang compiler. The vector content layout is as follows:

struct _N_VectorContent_Hip
{
sunindextype       length;
booleantype        own_helper;
SUNMemory          host_data;
SUNMemory          device_data;
SUNHipExecPolicy*  stream_exec_policy;
SUNHipExecPolicy*  reduce_exec_policy;
SUNMemoryHelper    mem_helper;
void*              priv; /* 'private' data */
};

typedef struct _N_VectorContent_Hip *N_VectorContent_Hip;


The content members are the vector length (size), a boolean flag that signals if the vector owns the data (i.e. it is in charge of freeing the data), pointers to vector data on the host and the device, pointers to SUNHipExecPolicy implementations that control how the HIP kernels are launched for streaming and reduction vector kernels, and a private data structure which holds additonal members that should not be accessed directly.

When instantiated with N_VNew_Hip(), the underlying data will be allocated on both the host and the device. Alternatively, a user can provide host and device data arrays by using the N_VMake_Hip() constructor. To use managed memory, the constructors N_VNewManaged_Hip() and N_VMakeManaged_Hip() are provided. Additionally, a user-defined SUNMemoryHelper for allocating/freeing data can be provided with the constructor N_VNewWithMemHelp_Hip(). Details on each of these constructors are provided below.

To use the NVECTOR_HIP module, include nvector_hip.h and link to the library libsundials_nvechip.lib. The extension, .lib, is typically .so for shared libraries and .a for static libraries.

## 9.16.1. NVECTOR_HIP functions

Unlike other native SUNDIALS vector types, the NVECTOR_HIP module does not provide macros to access its member variables. Instead, user should use the accessor functions:

realtype *N_VGetHostArrayPointer_Hip(N_Vector v)

This function returns pointer to the vector data on the host.

realtype *N_VGetDeviceArrayPointer_Hip(N_Vector v)

This function returns pointer to the vector data on the device.

booleantype N_VIsManagedMemory_Hip(N_Vector v)

This function returns a boolean flag indiciating if the vector data array is in managed memory or not.

The NVECTOR_HIP module defines implementations of all standard vector operations defined in §9.2, §9.2.2, §9.2.3, and §9.2.4, except for N_VSetArrayPointer(). The names of vector operations are obtained from those in §9.2, §9.2.2, §9.2.3, and §9.2.4 by appending the suffix _Hip (e.g. N_VDestroy_Hip()). The module NVECTOR_HIP provides the following additional user-callable routines:

N_Vector N_VNew_Hip(sunindextype length, SUNContext sunctx)

This function creates and allocates memory for a HIP N_Vector. The vector data array is allocated on both the host and device.

N_Vector N_VNewManaged_Hip(sunindextype vec_length, SUNContext sunctx)

This function creates and allocates memory for a HIP N_Vector. The vector data array is allocated in managed memory.

N_Vector N_VNewWithMemHelp_Hip(sunindextype length, booleantype use_managed_mem, SUNMemoryHelper helper, SUNContext sunctx)

This function creates a new HIP N_Vector with a user-supplied SUNMemoryHelper for allocating/freeing memory.

N_Vector N_VNewEmpty_Hip(sunindextype vec_length, SUNContext sunctx)

This function creates a new HIP N_Vector where the members of the content structure have not been allocated. This utility function is used by the other constructors to create a new vector.

N_Vector N_VMake_Hip(sunindextype vec_length, realtype *h_vdata, realtype *d_vdata, SUNContext sunctx)

This function creates a HIP N_Vector with user-supplied vector data arrays for the host and the device.

N_Vector N_VMakeManaged_Hip(sunindextype vec_length, realtype *vdata, SUNContext sunctx)

This function creates a HIP N_Vector with a user-supplied managed memory data array.

The module NVECTOR_HIP also provides the following user-callable routines:

void N_VSetKernelExecPolicy_Hip(N_Vector v, SUNHipExecPolicy *stream_exec_policy, SUNHipExecPolicy *reduce_exec_policy)

This function sets the execution policies which control the kernel parameters utilized when launching the streaming and reduction HIP kernels. By default the vector is setup to use the SUNHipThreadDirectExecPolicy() and SUNHipBlockReduceExecPolicy(). Any custom execution policy for reductions must ensure that the grid dimensions (number of thread blocks) is a multiple of the HIP warp size (32 for NVIDIA GPUs, 64 for AMD GPUs). See §9.16.2 below for more information about the SUNHipExecPolicy class. Providing NULL for an argument will result in the default policy being restored.

The input execution policies are cloned and, as such, may be freed after being attached to the desired vectors. A NULL input policy will reset the execution policy to the default setting.

Note

Note: All vectors used in a single instance of a SUNDIALS package must use the same execution policy. It is strongly recommended that this function is called immediately after constructing the vector, and any subsequent vector be created by cloning to ensure consistent execution policies across vectors*

realtype *N_VCopyToDevice_Hip(N_Vector v)

This function copies host vector data to the device.

realtype *N_VCopyFromDevice_Hip(N_Vector v)

This function copies vector data from the device to the host.

void N_VPrint_Hip(N_Vector v)

This function prints the content of a HIP vector to stdout.

void N_VPrintFile_Hip(N_Vector v, FILE *outfile)

This function prints the content of a HIP vector to outfile.

By default all fused and vector array operations are disabled in the NVECTOR_HIP module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_Hip(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VNew_Hip() will have the default settings for the NVECTOR_HIP module.

int N_VEnableFusedOps_Hip(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the HIP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombination_Hip(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the HIP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the HIP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableDotProdMulti_Hip(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the HIP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearSumVectorArray_Hip(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the HIP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableScaleVectorArray_Hip(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the HIP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableConstVectorArray_Hip(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the HIP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableWrmsNormVectorArray_Hip(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the HIP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the HIP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the HIP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombinationVectorArray_Hip(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the HIP vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• When there is a need to access components of an N_Vector_Hip, v, it is recommeded to use functions N_VGetDeviceArrayPointer_Hip() or N_VGetHostArrayPointer_Hip(). However, when using managed memory, the function N_VGetArrayPointer() may also be used.

• To maximize efficiency, vector operations in the NVECTOR_HIP implementation that have more than one N_Vector argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations.

## 9.16.2. The SUNHipExecPolicy Class

In order to provide maximum flexibility to users, the HIP kernel execution parameters used by kernels within SUNDIALS are defined by objects of the sundials::hip::ExecPolicy abstract class type (this class can be accessed in the global namespace as SUNHipExecPolicy). Thus, users may provide custom execution policies that fit the needs of their problem. The SUNHipExecPolicy class is defined as

typedef sundials::hip::ExecPolicy SUNHipExecPolicy

where the sundials::hip::ExecPolicy class is defined in the header file sundials_hip_policies.hpp, as follows:

class ExecPolicy
{
public:
ExecPolicy(hipStream_t stream = 0) : stream_(stream) { }
virtual size_t gridSize(size_t numWorkUnits = 0, size_t blockDim = 0) const = 0;
virtual size_t blockSize(size_t numWorkUnits = 0, size_t gridDim = 0) const = 0;
virtual const hipStream_t* stream() const { return (&stream_); }
virtual ExecPolicy* clone() const = 0;
ExecPolicy* clone_new_stream(hipStream_t stream) const {
ExecPolicy* ex = clone();
ex->stream_ = stream;
return ex;
}
virtual bool atomic() const { return false; }
virtual ~ExecPolicy() {}
protected:
hipStream_t stream_;
};


To define a custom execution policy, a user simply needs to create a class that inherits from the abstract class and implements the methods. The SUNDIALS provided sundials::hip::ThreadDirectExecPolicy (aka in the global namespace as SUNHipThreadDirectExecPolicy) class is a good example of a what a custom execution policy may look like:

class ThreadDirectExecPolicy : public ExecPolicy
{
public:
ThreadDirectExecPolicy(const size_t blockDim, hipStream_t stream = 0)
: blockDim_(blockDim), ExecPolicy(stream)
{}

: blockDim_(ex.blockDim_), ExecPolicy(ex.stream_)
{}

virtual size_t gridSize(size_t numWorkUnits = 0, size_t /*blockDim*/ = 0) const
{
/* ceil(n/m) = floor((n + m - 1) / m) */
return (numWorkUnits + blockSize() - 1) / blockSize();
}

virtual size_t blockSize(size_t /*numWorkUnits*/ = 0, size_t /*gridDim*/ = 0) const
{
return blockDim_;
}

virtual ExecPolicy* clone() const
{
}

private:
const size_t blockDim_;
};


In total, SUNDIALS provides 4 execution policies:

SUNHipThreadDirectExecPolicy(const size_t blockDim, const hipStream_t stream = 0)

Maps each HIP thread to a work unit. The number of threads per block (blockDim) can be set to anything. The grid size will be calculated so that there are enough threads for one thread per element. If a HIP stream is provided, it will be used to execute the kernel.

SUNHipGridStrideExecPolicy(const size_t blockDim, const size_t gridDim, const hipStream_t stream = 0)

Is for kernels that use grid stride loops. The number of threads per block (blockDim) can be set to anything. The number of blocks (gridDim) can be set to anything. If a HIP stream is provided, it will be used to execute the kernel.

SUNHipBlockReduceExecPolicy(const size_t blockDim, const hipStream_t stream = 0)

Is for kernels performing a reduction across indvidual thread blocks. The number of threads per block (blockDim) can be set to any valid multiple of the HIP warp size. The grid size (gridDim) can be set to any value greater than 0. If it is set to 0, then the grid size will be chosen so that there is enough threads for one thread per work unit. If a HIP stream is provided, it will be used to execute the kernel.

SUNHipBlockReduceAtomicExecPolicy(const size_t blockDim, const hipStream_t stream = 0)

Is for kernels performing a reduction across indvidual thread blocks using atomic operations. The number of threads per block (blockDim) can be set to any valid multiple of the HIP warp size. The grid size (gridDim) can be set to any value greater than 0. If it is set to 0, then the grid size will be chosen so that there is enough threads for one thread per work unit. If a HIP stream is provided, it will be used to execute the kernel.

For example, a policy that uses 128 threads per block and a user provided stream can be created like so:

hipStream_t stream;
hipStreamCreate(&stream);


These default policy objects can be reused for multiple SUNDIALS data structures (e.g. a SUNMatrix and an N_Vector) since they do not hold any modifiable state information.

# 9.17. The NVECTOR_RAJA Module

The NVECTOR_RAJA module is an experimental NVECTOR implementation using the RAJA hardware abstraction layer. In this implementation, RAJA allows for SUNDIALS vector kernels to run on AMD, NVIDIA, or Intel GPU devices. The module is intended for users who are already familiar with RAJA and GPU programming. Building this vector module requires a C++11 compliant compiler and either the NVIDIA CUDA programming environment, the AMD ROCm HIP programming environment, or a compiler that supports the SYCL abstraction layer. When using the AMD ROCm HIP environment, the HIP-clang compiler must be utilized. Users can select which backend to compile with by setting the SUNDIALS_RAJA_BACKENDS CMake variable to either CUDA, HIP, or SYCL. Besides the CUDA, HIP, and SYCL backends, RAJA has other backends such as serial, OpenMP, and OpenACC. These backends are not used in this SUNDIALS release.

The vector content layout is as follows:

struct _N_VectorContent_Raja
{
sunindextype length;
booleantype  own_data;
realtype*    host_data;
realtype*    device_data;
void*        priv; /* 'private' data */
};


The content members are the vector length (size), a boolean flag that signals if the vector owns the data (i.e., it is in charge of freeing the data), pointers to vector data on the host and the device, and a private data structure which holds the memory management type, which should not be accessed directly.

When instantiated with N_VNew_Raja(), the underlying data will be allocated on both the host and the device. Alternatively, a user can provide host and device data arrays by using the N_VMake_Raja() constructor. To use managed memory, the constructors N_VNewManaged_Raja() and N_VMakeManaged_Raja() are provided. Details on each of these constructors are provided below.

The header file to include when using this is nvector_raja.h. The installed module library to link to is libsundials_nveccudaraja.lib when using the CUDA backend, libsundials_nvechipraja.lib when using the HIP backend, and libsundials_nvecsyclraja.lib when using the SYCL backend. The extension .lib is typically .so for shared libraries .a for static libraries.

## 9.17.1. NVECTOR_RAJA functions

Unlike other native SUNDIALS vector types, the NVECTOR_RAJA module does not provide macros to access its member variables. Instead, user should use the accessor functions:

realtype *N_VGetHostArrayPointer_Raja(N_Vector v)

This function returns pointer to the vector data on the host.

realtype *N_VGetDeviceArrayPointer_Raja(N_Vector v)

This function returns pointer to the vector data on the device.

booleantype N_VIsManagedMemory_Raja(N_Vector v)

This function returns a boolean flag indicating if the vector data is allocated in managed memory or not.

The NVECTOR_RAJA module defines the implementations of all vector operations listed in §9.2, §9.2.2, §9.2.3, and §9.2.4, except for N_VDotProdMulti(), N_VWrmsNormVectorArray(), and N_VWrmsNormMaskVectorArray() as support for arrays of reduction vectors is not yet supported in RAJA. These functions will be added to the NVECTOR_RAJA implementation in the future. Additionally, the operations N_VGetArrayPointer() and N_VSetArrayPointer() are not implemented by the RAJA vector. As such, this vector cannot be used with SUNDIALS direct solvers and preconditioners. The NVECTOR_RAJA module provides separate functions to access data on the host and on the device. It also provides methods for copying from the host to the device and vice versa. Usage examples of NVECTOR_RAJA are provided in some example programs for CVODE [64].

The names of vector operations are obtained from those in §9.2, §9.2.2, §9.2.3, and §9.2.4 by appending the suffix _Raja (e.g. N_VDestroy_Raja). The module NVECTOR_RAJA provides the following additional user-callable routines:

N_Vector N_VNew_Raja(sunindextype vec_length, SUNContext sunctx)

This function creates and allocates memory for a RAJA N_Vector. The memory is allocated on both the host and the device. Its only argument is the vector length.

N_Vector N_VNewManaged_Raja(sunindextype vec_length, SUNContext sunctx)

This function creates and allocates memory for a RAJA N_Vector. The vector data array is allocated in managed memory.

N_Vector N_VMake_Raja(sunindextype length, realtype *h_data, realtype *v_data, SUNContext sunctx)

This function creates an NVECTOR_RAJA with user-supplied host and device data arrays. This function does not allocate memory for data itself.

N_Vector N_VMakeManaged_Raja(sunindextype length, realtype *vdata, SUNContext sunctx)

This function creates an NVECTOR_RAJA with a user-supplied managed memory data array. This function does not allocate memory for data itself.

N_Vector N_VNewWithMemHelp_Raja(sunindextype length, booleantype use_managed_mem, SUNMemoryHelper helper, SUNContext sunctx)

This function creates an NVECTOR_RAJA with a user-supplied SUNMemoryHelper for allocating/freeing memory.

N_Vector N_VNewEmpty_Raja()

This function creates a new N_Vector where the members of the content structure have not been allocated. This utility function is used by the other constructors to create a new vector.

void N_VCopyToDevice_Raja(N_Vector v)

This function copies host vector data to the device.

void N_VCopyFromDevice_Raja(N_Vector v)

This function copies vector data from the device to the host.

void N_VPrint_Raja(N_Vector v)

This function prints the content of a RAJA vector to stdout.

void N_VPrintFile_Raja(N_Vector v, FILE *outfile)

This function prints the content of a RAJA vector to outfile.

By default all fused and vector array operations are disabled in the NVECTOR_RAJA module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_Raja(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VNew_Raja() will have the default settings for the NVECTOR_RAJA module.

int N_VEnableFusedOps_Raja(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombination_Raja(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearSumVectorArray_Raja(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableScaleVectorArray_Raja(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableConstVectorArray_Raja(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombinationVectorArray_Raja(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the RAJA vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• When there is a need to access components of an NVECTOR_RAJA vector, it is recommended to use functions N_VGetDeviceArrayPointer_Raja() or N_VGetHostArrayPointer_Raja(). However, when using managed memory, the function N_VGetArrayPointer() may also be used.

• To maximize efficiency, vector operations in the NVECTOR_RAJA implementation that have more than one N_Vector argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations.

# 9.18. The NVECTOR_SYCL Module

The NVECTOR_SYCL module is an experimental NVECTOR implementation using the SYCL abstraction layer. At present the only supported SYCL compiler is the DPC++ (Intel oneAPI) compiler. This module allows for SUNDIALS vector kernels to run on Intel GPU devices. The module is intended for users who are already familiar with SYCL and GPU programming.

The vector content layout is as follows:

struct _N_VectorContent_Sycl
{
sunindextype       length;
booleantype        own_helper;
SUNMemory          host_data;
SUNMemory          device_data;
SUNSyclExecPolicy* stream_exec_policy;
SUNSyclExecPolicy* reduce_exec_policy;
SUNMemoryHelper    mem_helper;
sycl::queue*       queue;
void*              priv; /* 'private' data */
};

typedef struct _N_VectorContent_Sycl *N_VectorContent_Sycl;


The content members are the vector length (size), boolean flags that indicate if the vector owns the execution policies and memory helper objects (i.e., it is in charge of freeing the objects), SUNMemory objects for the vector data on the host and device, pointers to execution policies that control how streaming and reduction kernels are launched, a SUNMemoryHelper for performing memory operations, the SYCL queue, and a private data structure which holds additional members that should not be accessed directly.

When instantiated with N_VNew_Sycl(), the underlying data will be allocated on both the host and the device. Alternatively, a user can provide host and device data arrays by using the N_VMake_Sycl() constructor. To use managed (shared) memory, the constructors N_VNewManaged_Sycl() and N_VMakeManaged_Sycl() are provided. Additionally, a user-defined SUNMemoryHelper for allocating/freeing data can be provided with the constructor N_VNewWithMemHelp_Sycl(). Details on each of these constructors are provided below.

The header file to include when using this is nvector_sycl.h. The installed module library to link to is libsundials_nvecsycl.lib. The extension .lib is typically .so for shared libraries .a for static libraries.

## 9.18.1. NVECTOR_SYCL functions

The NVECTOR_SYCL module implementations of all vector operations listed in §9.2, §9.2.2, §9.2.3, and §9.2.4, except for N_VDotProdMulti(), N_VWrmsNormVectorArray(), N_VWrmsNormMaskVectorArray() as support for arrays of reduction vectors is not yet supported. These functions will be added to the NVECTOR_SYCL implementation in the future. The names of vector operations are obtained from those in the aforementioned sections by appending the suffix _Sycl (e.g., N_VDestroy_Sycl).

Additionally, the NVECTOR_SYCL module provides the following user-callable constructors for creating a new NVECTOR_SYCL:

N_Vector N_VNew_Sycl(sunindextype vec_length, sycl::queue *Q, SUNContext sunctx)

This function creates and allocates memory for an NVECTOR_SYCL. Vector data arrays are allocated on both the host and the device associated with the input queue. All operation are launched in the provided queue.

N_Vector N_VNewManaged_Sycl(sunindextype vec_length, sycl::queue *Q, SUNContext sunctx)

This function creates and allocates memory for a NVECTOR_SYCL. The vector data array is allocated in managed (shared) memory using the input queue. All operation are launched in the provided queue.

N_Vector N_VMake_Sycl(sunindextype length, realtype *h_vdata, realtype *d_vdata, sycl::queue *Q, SUNContext sunctx)

This function creates an NVECTOR_SYCL with user-supplied host and device data arrays. This function does not allocate memory for data itself. All operation are launched in the provided queue.

N_Vector N_VMakeManaged_Sycl(sunindextype length, realtype *vdata, sycl::queue *Q, SUNContext sunctx)

This function creates an NVECTOR_SYCL with a user-supplied managed (shared) data array. This function does not allocate memory for data itself. All operation are launched in the provided queue.

N_Vector N_VNewWithMemHelp_Sycl(sunindextype length, booleantype use_managed_mem, SUNMemoryHelper helper, sycl::queue *Q, SUNContext sunctx)

This function creates an NVECTOR_SYCL with a user-supplied SUNMemoryHelper for allocating/freeing memory. All operation are launched in the provided queue.

N_Vector N_VNewEmpty_Sycl()

This function creates a new N_Vector where the members of the content structure have not been allocated. This utility function is used by the other constructors to create a new vector.

The following user-callable functions are provided for accessing the vector data arrays on the host and device and copying data between the two memory spaces. Note the generic NVECTOR operations N_VGetArrayPointer() and N_VSetArrayPointer() are mapped to the corresponding HostArray functions given below. To ensure memory coherency, a user will need to call the CopyTo or CopyFrom functions as necessary to transfer data between the host and device, unless managed (shared) memory is used.

realtype *N_VGetHostArrayPointer_Sycl(N_Vector v)

This function returns a pointer to the vector host data array.

realtype *N_VGetDeviceArrayPointer_Sycl(N_Vector v)

This function returns a pointer to the vector device data array.

void N_VSetHostArrayPointer_Sycl(realtype *h_vdata, N_Vector v)

This function sets the host array pointer in the vector v.

void N_VSetDeviceArrayPointer_Sycl(realtype *d_vdata, N_Vector v)

This function sets the device array pointer in the vector v.

void N_VCopyToDevice_Sycl(N_Vector v)

This function copies host vector data to the device.

void N_VCopyFromDevice_Sycl(N_Vector v)

This function copies vector data from the device to the host.

booleantype N_VIsManagedMemory_Sycl(N_Vector v)

This function returns SUNTRUE if the vector data is allocated as managed (shared) memory otherwise it returns SUNFALSE.

The following user-callable function is provided to set the execution policies for how SYCL kernels are launched on a device.

int N_VSetKernelExecPolicy_Sycl(N_Vector v, SUNSyclExecPolicy *stream_exec_policy, SUNSyclExecPolicy *reduce_exec_policy)

This function sets the execution policies which control the kernel parameters utilized when launching the streaming and reduction kernels. By default the vector is setup to use the SUNSyclThreadDirectExecPolicy() and SUNSyclBlockReduceExecPolicy(). See §9.18.2 below for more information about the SUNSyclExecPolicy class.

The input execution policies are cloned and, as such, may be freed after being attached to the desired vectors. A NULL input policy will reset the execution policy to the default setting.

Note

All vectors used in a single instance of a SUNDIALS package must use the same execution policy. It is strongly recommended that this function is called immediately after constructing the vector, and any subsequent vector be created by cloning to ensure consistent execution policies across vectors.

The following user-callable functions are provided to print the host vector data array. Unless managed memory is used, a user may need to call N_VCopyFromDevice_Sycl() to ensure consistency between the host and device array.

void N_VPrint_Sycl(N_Vector v)

This function prints the host data array to stdout.

void N_VPrintFile_Sycl(N_Vector v, FILE *outfile)

This function prints the host data array to outfile.

By default all fused and vector array operations are disabled in the NVECTOR_SYCL module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with one of the above constructors, enable/disable the desired operations on that vector with the functions below, and then use this vector in conjunction with N_VClone() to create any additional vectors. This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created by any of the constructors above will have the default settings for the NVECTOR_SYCL module.

int N_VEnableFusedOps_Sycl(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the SYCL vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombination_Sycl(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the SYCL vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the SYCL vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearSumVectorArray_Sycl(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the SYCL vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableScaleVectorArray_Sycl(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the SYCL vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableConstVectorArray_Sycl(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the SYCL vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the SYCL vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombinationVectorArray_Sycl(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the SYCL vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• When there is a need to access components of an NVECTOR_SYCL, v, it is recommended to use N_VGetDeviceArrayPointer() to access the device array or N_VGetArrayPointer() for the host array. When using managed (shared) memory, either function may be used. To ensure memory coherency, a user may need to call the CopyTo or CopyFrom functions as necessary to transfer data between the host and device, unless managed (shared) memory is used.

• To maximize efficiency, vector operations in the NVECTOR_SYCL implementation that have more than one N_Vector argument do not check for consistent internal representations of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same internal representations.

## 9.18.2. The SUNSyclExecPolicy Class

In order to provide maximum flexibility to users, the SYCL kernel execution parameters used by kernels within SUNDIALS are defined by objects of the sundials::sycl::ExecPolicy abstract class type (this class can be accessed in the global namespace as SUNSyclExecPolicy). Thus, users may provide custom execution policies that fit the needs of their problem. The SUNSyclExecPolicy class is defined as

typedef sundials::sycl::ExecPolicy SUNSyclExecPolicy

where the sundials::sycl::ExecPolicy class is defined in the header file sundials_sycl_policies.hpp, as follows:

class ExecPolicy
{
public:
virtual size_t gridSize(size_t numWorkUnits = 0, size_t blockDim = 0) const = 0;
virtual size_t blockSize(size_t numWorkUnits = 0, size_t gridDim = 0) const = 0;
virtual ExecPolicy* clone() const = 0;
virtual ~ExecPolicy() {}
};


For consistency the function names and behavior mirror the execution policies for the CUDA and HIP vectors. In the SYCL case the blockSize is the local work-group range in a one-dimensional nd_range (threads per group). The gridSize is the number of local work groups so the global work-group range in a one-dimensional nd_range is blockSize * gridSize (total number of threads). All vector kernels are written with a many-to-one mapping where work units (vector elements) are mapped in a round-robin manner across the global range. As such, the blockSize and gridSize can be set to any positive value.

To define a custom execution policy, a user simply needs to create a class that inherits from the abstract class and implements the methods. The SUNDIALS provided sundials::sycl::ThreadDirectExecPolicy (aka in the global namespace as SUNSyclThreadDirectExecPolicy) class is a good example of a what a custom execution policy may look like:

class ThreadDirectExecPolicy : public ExecPolicy
{
public:
: blockDim_(blockDim)
{}

: blockDim_(ex.blockDim_)
{}

virtual size_t gridSize(size_t numWorkUnits = 0, size_t blockDim = 0) const
{
return (numWorkUnits + blockSize() - 1) / blockSize();
}

virtual size_t blockSize(size_t numWorkUnits = 0, size_t gridDim = 0) const
{
return blockDim_;
}

virtual ExecPolicy* clone() const
{
}

private:
const size_t blockDim_;
};


SUNDIALS provides the following execution policies:

Is for kernels performing streaming operations and maps each work unit (vector element) to a work-item (thread). Based on the local work-group range (number of threads per group, blockSize) the number of local work-groups (gridSize) is computed so there are enough work-items in the global work-group range ( total number of threads, blockSize * gridSize) for one work unit per work-item (thread).

SUNSyclGridStrideExecPolicy(const size_t blockDim, const size_t gridDim)

Is for kernels performing streaming operations and maps each work unit (vector element) to a work-item (thread) in a round-robin manner so the local work-group range (number of threads per group, blockSize) and the number of local work-groups (gridSize) can be set to any positive value. In this case the global work-group range (total number of threads, blockSize * gridSize) may be less than the number of work units (vector elements).

SUNSyclBlockReduceExecPolicy(const size_t blockDim)

Is for kernels performing a reduction, the local work-group range (number of threads per group, blockSize) and the number of local work-groups (gridSize) can be set to any positive value or the gridSize may be set to 0 in which case the global range is chosen so that there are enough threads for at most two work units per work-item.

By default the NVECTOR_SYCL module uses the SUNSyclThreadDirectExecPolicy and SUNSyclBlockReduceExecPolicy where the default blockDim is determined by querying the device for the max_work_group_size. User may specify different policies by constructing a new SyclExecPolicy and attaching it with N_VSetKernelExecPolicy_Sycl(). For example, a policy that uses 128 work-items (threads) per group can be created and attached like so:

N_Vector v = N_VNew_Sycl(length, SUNContext sunctx);
SUNSyclBlockReduceExecPolicy  block_reduce(128);


These default policy objects can be reused for multiple SUNDIALS data structures (e.g. a SUNMatrix and an N_Vector) since they do not hold any modifiable state information.

# 9.19. The NVECTOR_OPENMPDEV Module

The NVECTOR_OPENMPDEV implementation defines the content field of the N_Vector to be a structure containing the length of the vector, a pointer to the beginning of a contiguousdata array on the host, a pointer to the beginning of a contiguous data array on the device, and a boolean flag own_data which specifies the ownership of host and device data arrays.

struct _N_VectorContent_OpenMPDEV
{
sunindextype length;
booleantype  own_data;
realtype     *host_data;
realtype     *dev_data;
};


The header file to include when using this module is nvector_openmpdev.h. The installed module library to link to is libsundials_nvecopenmpdev.lib where .lib is typically .so for shared libraries and .a for static libraries.

## 9.19.1. NVECTOR_OPENMPDEV accessor macros

The following macros are provided to access the content of an NVECTOR_OPENMPDEV vector.

NV_CONTENT_OMPDEV(v)

This macro gives access to the contents of the NVECTOR_OPENMPDEV N_Vector v.

The assignment v_cont = NV_CONTENT_S(v) sets v_cont to be a pointer to the NVECTOR_OPENMPDEV content structure.

Implementation:

#define NV_CONTENT_OMPDEV(v) ( (N_VectorContent_OpenMPDEV)(v->content) )

NV_OWN_DATA_OMPDEV(v)

Access the own_data component of the OpenMPDEV N_Vector v.

The assignment v_data = NV_DATA_HOST_OMPDEV(v) sets v_data to be a pointer to the first component of the data on the host for the N_Vector v.

Implementation:

#define NV_OWN_DATA_OMPDEV(v) ( NV_CONTENT_OMPDEV(v)->own_data )

NV_DATA_HOST_OMPDEV(v)

The assignment NV_DATA_HOST_OMPDEV(v) = v_data sets the host component array of v to be v_data by storing the pointer v_data.

Implementation:

#define NV_DATA_HOST_OMPDEV(v) ( NV_CONTENT_OMPDEV(v)->host_data )

NV_DATA_DEV_OMPDEV(v)

The assignment v_dev_data = NV_DATA_DEV_OMPDEV(v) sets v_dev_data to be a pointer to the first component of the data on the device for the N_Vector v. The assignment NV_DATA_DEV_OMPDEV(v) = v_dev_data sets the device component array of v to be v_dev_data by storing the pointer v_dev_data.

Implementation:

#define NV_DATA_DEV_OMPDEV(v) ( NV_CONTENT_OMPDEV(v)->dev_data )

NV_LENGTH_OMPDEV(V)

Access the length component of the OpenMPDEV N_Vector v.

The assignment v_len = NV_LENGTH_OMPDEV(v) sets v_len to be the length of v. On the other hand, the call NV_LENGTH_OMPDEV(v) = len_v sets the length of v to be len_v.

#define NV_LENGTH_OMPDEV(v) ( NV_CONTENT_OMPDEV(v)->length )


## 9.19.2. NVECTOR_OPENMPDEV functions

The NVECTOR_OPENMPDEV module defines OpenMP device offloading implementations of all vector operations listed in §9.2, §9.2.2, §9.2.3, and §9.2.4, except for N_VSetArrayPointer(). As such, this vector cannot be used with the SUNDIALS direct solvers and preconditioners. It also provides methods for copying from the host to the device and vice versa.

The names of the vector operations are obtained from those in §9.2, §9.2.2, §9.2.3, and §9.2.4 by appending the suffix _OpenMPDEV (e.g. N_VDestroy_OpenMPDEV). The module NVECTOR_OPENMPDEV provides the following additional user-callable routines:

N_Vector N_VNew_OpenMPDEV(sunindextype vec_length, SUNContext sunctx)

This function creates and allocates memory for an NVECTOR_OPENMPDEV N_Vector.

N_Vector N_VNewEmpty_OpenMPDEV(sunindextype vec_length, SUNContext sunctx)

This function creates a new NVECTOR_OPENMPDEV N_Vector with an empty (NULL) data array.

N_Vector N_VMake_OpenMPDEV(sunindextype vec_length, realtype *h_vdata, realtype *d_vdata, SUNContext sunctx)

This function creates an NVECTOR_OPENMPDEV vector with user-supplied vector data arrays h_vdata and d_vdata. This function does not allocate memory for data itself.

realtype *N_VGetHostArrayPointer_OpenMPDEV(N_Vector v)

This function returns a pointer to the host data array.

realtype *N_VGetDeviceArrayPointer_OpenMPDEV(N_Vector v)

This function returns a pointer to the device data array.

void N_VPrint_OpenMPDEV(N_Vector v)

This function prints the content of an NVECTOR_OPENMPDEV vector to stdout.

void N_VPrintFile_OpenMPDEV(N_Vector v, FILE *outfile)

This function prints the content of an NVECTOR_OPENMPDEV vector to outfile.

void N_VCopyToDevice_OpenMPDEV(N_Vector v)

This function copies the content of an NVECTOR_OPENMPDEV vector’s host data array to the device data array.

void N_VCopyFromDevice_OpenMPDEV(N_Vector v)

This function copies the content of an NVECTOR_OPENMPDEV vector’s device data array to the host data array.

By default all fused and vector array operations are disabled in the NVECTOR_OPENMPDEV module. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_OpenMPDEV, enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone. This guarantees the new vectors will have the same operations enabled/disabled as cloned vectors inherit the same enable/disable options as the vector they are cloned from while vectors created with N_VNew_OpenMPDEV will have the default settings for the NVECTOR_OPENMPDEV module.

int N_VEnableFusedOps_OpenMPDEV(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the NVECTOR_OPENMPDEV vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombination_OpenMPDEV(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the NVECTOR_OPENMPDEV vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the NVECTOR_OPENMPDEV vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableDotProdMulti_OpenMPDEV(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the NVECTOR_OPENMPDEV vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearSumVectorArray_OpenMPDEV(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableScaleVectorArray_OpenMPDEV(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableConstVectorArray_OpenMPDEV(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableWrmsNormVectorArray_OpenMPDEV(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector array to multiple vector arrays operation in the NVECTOR_OPENMPDEV vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombinationVectorArray_OpenMPDEV(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination operation for vector arrays in the NVECTOR_OPENMPDEV vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• When looping over the components of an N_Vector v, it is most efficient to first obtain the component array via h_data = N_VGetArrayPointer(v) for the host array or v_data = N_VGetDeviceArrayPointer(v) for the device array, or equivalently to use the macros h_data = NV_DATA_HOST_OMPDEV(v) for the host array or v_data = NV_DATA_DEV_OMPDEV(v) for the device array, and then access h_data[i] or v_data[i] within the loop.

• When accessing individual components of an N_Vector v on the host remember to first copy the array back from the device with N_VCopyFromDevice_OpenMPDEV(v) to ensure the array is up to date.

• N_VNewEmpty_OpenMPDEV(), N_VMake_OpenMPDEV(), and N_VCloneVectorArrayEmpty_OpenMPDEV() set the field own_data to SUNFALSE. The functions N_VDestroy_OpenMPDEV() and N_VDestroyVectorArray_OpenMPDEV() will not attempt to free the pointer data for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the data pointers.

• To maximize efficiency, vector operations in the NVECTOR_OPENMPDEV implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same length.

# 9.20. The NVECTOR_TRILINOS Module

The NVECTOR_TRILINOS module is an NVECTOR wrapper around the Trilinos Tpetra vector. The interface to Tpetra is implemented in the sundials::trilinos::nvector_tpetra::TpetraVectorInterface class. This class simply stores a reference counting pointer to a Tpetra vector and inherits from an empty structure

struct _N_VectorContent_Trilinos {};


to interface the C++ class with the NVECTOR C code. A pointer to an instance of this class is kept in the content field of the N_Vector object, to ensure that the Tpetra vector is not deleted for as long as the N_Vector object exists.

The Tpetra vector type in the sundials::trilinos::nvector_tpetra::TpetraVectorInterface class is defined as:

typedef Tpetra::Vector<realtype, int, sunindextype> vector_type;

The Tpetra vector will use the SUNDIALS-specified realtype as its scalar type, int as the local ordinal type, and sunindextype as the global ordinal type. This type definition will use Tpetra’s default node type. Available Kokkos node types as of the Trilinos 12.14 release are serial (single thread), OpenMP, Pthread, and CUDA. The default node type is selected when building the Kokkos package. For example, the Tpetra vector will use a CUDA node if Tpetra was built with CUDA support and the CUDA node was selected as the default when Tpetra was built.

The header file to include when using this module is nvector_trilinos.h. The installed module library to link to is libsundials_nvectrilinos.lib where .lib is typically .so for shared libraries and .a for static libraries.

## 9.20.1. NVECTOR_TRILINOS functions

The NVECTOR_TRILINOS module defines implementations of all vector operations listed in §9.2, §9.2.2, §9.2.3, and §9.2.4, except for N_VGetArrayPointer() and N_VSetArrayPointer(). As such, this vector cannot be used with the SUNDIALS direct solvers and preconditioners. When access to raw vector data is needed, it is recommended to extract the Trilinos Tpetra vector first, and then use Tpetra vector methods to access the data. Usage examples of NVECTOR_TRILINOS are provided in example programs for IDA.

The names of vector operations are obtained from those in §9.2 by appending the suffice _Trilinos (e.g. N_VDestroy_Trilinos). Vector operations call existing Tpetra::Vector methods when available. Vector operations specific to SUNDIALS are implemented as standalone functions in the namespace sundials::trilinos::nvector_tpetra::TpetraVector, located in the file SundialsTpetraVectorKernels.hpp. The module NVECTOR_TRILINOS provides the following additional user-callable routines:

Teuchos::RCP<vector_type> N_VGetVector_Trilinos(N_Vector v)

This C++ function takes an N_Vector as the argument and returns a reference counting pointer to the underlying Tpetra vector. This is a standalone function defined in the global namespace.

N_Vector N_VMake_Trilinos(Teuchos::RCP<vector_type> v)

This C++ function creates and allocates memory for an NVECTOR_TRILINOS wrapper around a user-provided Tpetra vector. This is a standalone function defined in the global namespace.

Notes

• The template parameter vector_type should be set as:

typedef sundials::trilinos::nvector_tpetra::TpetraVectorInterface::vector_type vector_type


This will ensure that data types used in Tpetra vector match those in SUNDIALS.

• When there is a need to access components of an N_Vector_Trilinos v, it is recommeded to extract the Trilinos vector object via x_vec = N_VGetVector_Trilinos(v) and then access components using the appropriate Trilinos functions.

• The functions N_VDestroy_Trilinos and N_VDestroyVectorArray_Trilinos only delete the N_Vector wrapper. The underlying Tpetra vector object will exist for as long as there is at least one reference to it.

# 9.21. The NVECTOR_MANYVECTOR Module

The NVECTOR_MANYVECTOR module is designed to facilitate problems with an inherent data partitioning within a computational node for the solution vector. These data partitions are entirely user-defined, through construction of distinct NVECTOR modules for each component, that are then combined together to form the NVECTOR_MANYVECTOR. Two potential use cases for this flexibility include:

1. Heterogenous computational architectures: for data partitioning between different computing resources on a node, architecture-specific subvectors may be created for each partition. For example, a user could create one GPU-accelerated component based on NVECTOR_CUDA, and another CPU threaded component based on NVECTOR_OPENMP.

2. Structure of arrays (SOA) data layouts: for problems that require separate subvectors for each solution component. For example, in an incompressible Navier-Stokes simulation, separate subvectors may be used for velocities and pressure, which are combined together into a single NVECTOR_MANYVECTOR for the overall “solution”.

The above use cases are neither exhaustive nor mutually exclusive, and the NVECTOR_MANYVECTOR implementation should support arbitrary combinations of these cases.

The NVECTOR_MANYVECTOR implementation is designed to work with any NVECTOR subvectors that implement the minimum “standard” set of operations in §9.2.1. Additionally, NVECTOR_MANYVECTOR sets no limit on the number of subvectors that may be attached (aside from the limitations of using sunindextype for indexing, and standard per-node memory limitations). However, while this ostensibly supports subvectors with one entry each (i.e., one subvector for each solution entry), we anticipate that this extreme situation will hinder performance due to non-stride-one memory accesses and increased function call overhead. We therefore recommend a relatively coarse partitioning of the problem, although actual performance will likely be problem-dependent.

As a final note, in the coming years we plan to introduce additional algebraic solvers and time integration modules that will leverage the problem partitioning enabled by NVECTOR_MANYVECTOR. However, even at present we anticipate that users will be able to leverage such data partitioning in their problem-defining ODE right-hand side function, DAE or nonlinear solver residual function, preconditioners, or custom SUNLinearSolver or SUNNonlinearSolver modules.

## 9.21.1. NVECTOR_MANYVECTOR structure

The NVECTOR_MANYVECTOR implementation defines the content field of N_Vector to be a structure containing the number of subvectors comprising the ManyVector, the global length of the ManyVector (including all subvectors), a pointer to the beginning of the array of subvectors, and a boolean flag own_data indicating ownership of the subvectors that populate subvec_array.

struct _N_VectorContent_ManyVector {
sunindextype  num_subvectors;  /* number of vectors attached      */
sunindextype  global_length;   /* overall manyvector length       */
N_Vector*     subvec_array;    /* pointer to N_Vector array       */
booleantype   own_data;        /* flag indicating data ownership  */
};


The header file to include when using this module is nvector_manyvector.h. The installed module library to link against is libsundials_nvecmanyvector.lib where .lib is typically .so for shared libraries and .a for static libraries.

## 9.21.2. NVECTOR_MANYVECTOR functions

The NVECTOR_MANYVECTOR module implements all vector operations listed in §9.2 except for N_VGetArrayPointer(), N_VSetArrayPointer(), N_VScaleAddMultiVectorArray(), and N_VLinearCombinationVectorArray(). As such, this vector cannot be used with the SUNDIALS direct solvers and preconditioners. Instead, the NVECTOR_MANYVECTOR module provides functions to access subvectors, whose data may in turn be accessed according to their NVECTOR implementations.

The names of vector operations are obtained from those in §9.2 by appending the suffix _ManyVector (e.g. N_VDestroy_ManyVector). The module NVECTOR_MANYVECTOR provides the following additional user-callable routines:

N_Vector N_VNew_ManyVector(sunindextype num_subvectors, N_Vector *vec_array, SUNContext sunctx)

This function creates a ManyVector from a set of existing NVECTOR objects.

This routine will copy all N_Vector pointers from the input vec_array, so the user may modify/free that pointer array after calling this function. However, this routine does not allocate any new subvectors, so the underlying NVECTOR objects themselves should not be destroyed before the ManyVector that contains them.

Upon successful completion, the new ManyVector is returned; otherwise this routine returns NULL (e.g., a memory allocation failure occurred).

Users of the Fortran 2003 interface to this function will first need to use the generic N_Vector utility functions N_VNewVectorArray(), and N_VSetVecAtIndexVectorArray() to create the N_Vector* argument. This is further explained in §2.5.2.5, and the functions are documented in §9.1.1.

N_Vector N_VGetSubvector_ManyVector(N_Vector v, sunindextype vec_num)

This function returns the vec_num subvector from the NVECTOR array.

realtype *N_VGetSubvectorArrayPointer_ManyVector(N_Vector v, sunindextype vec_num)

This function returns the data array pointer for the vec_num subvector from the NVECTOR array.

If the input vec_num is invalid, or if the subvector does not support the N_VGetArrayPointer operation, then NULL is returned.

int N_VSetSubvectorArrayPointer_ManyVector(realtype *v_data, N_Vector v, sunindextype vec_num)

This function sets the data array pointer for the vec_num subvector from the NVECTOR array.

If the input vec_num is invalid, or if the subvector does not support the N_VSetArrayPointer operation, then -1 is returned; otherwise it returns 0.

sunindextype N_VGetNumSubvectors_ManyVector(N_Vector v)

This function returns the overall number of subvectors in the ManyVector object.

By default all fused and vector array operations are disabled in the NVECTOR_MANYVECTOR module, except for N_VWrmsNormVectorArray() and N_VWrmsNormMaskVectorArray(), that are enabled by default. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_ManyVector(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees that the new vectors will have the same operations enabled/disabled, since cloned vectors inherit those configuration options from the vector they are cloned from, while vectors created with N_VNew_ManyVector() will have the default settings for the NVECTOR_MANYVECTOR module. We note that these routines do not call the corresponding routines on subvectors, so those should be set up as desired before attaching them to the ManyVector in N_VNew_ManyVector().

int N_VEnableFusedOps_ManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the manyvector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombination_ManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the manyvector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the manyvector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableDotProdMulti_ManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the manyvector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearSumVectorArray_ManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the manyvector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableScaleVectorArray_ManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the manyvector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableConstVectorArray_ManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the manyvector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableWrmsNormVectorArray_ManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the manyvector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the manyvector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• N_VNew_ManyVector() sets the field own_data = SUNFALSE. N_VDestroy_ManyVector() will not attempt to call N_VDestroy() on any subvectors contained in the subvector array for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the subvectors.

• To maximize efficiency, arithmetic vector operations in the NVECTOR_MANYVECTOR implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same subvector representations.

# 9.22. The NVECTOR_MPIMANYVECTOR Module

The NVECTOR_MPIMANYVECTOR module is designed to facilitate problems with an inherent data partitioning for the solution vector, and when using distributed-memory parallel architectures. As such, this implementation supports all use cases allowed by the MPI-unaware NVECTOR_MANYVECTOR implementation, as well as partitioning data between nodes in a parallel environment. These data partitions are entirely user-defined, through construction of distinct NVECTOR modules for each component, that are then combined together to form the NVECTOR_MPIMANYVECTOR. Three potential use cases for this module include:

1. Heterogenous computational architectures (single-node or multi-node): for data partitioning between different computing resources on a node, architecture-specific subvectors may be created for each partition. For example, a user could create one MPI-parallel component based on NVECTOR_PARALLEL, another GPU-accelerated component based on NVECTOR_CUDA.

2. Process-based multiphysics decompositions (multi-node): for computations that combine separate MPI-based simulations together, each subvector may reside on a different MPI communicator, and the MPIManyVector combines these via an MPI intercommunicator that connects these distinct simulations together.

3. Structure of arrays (SOA) data layouts (single-node or multi-node): for problems that require separate subvectors for each solution component. For example, in an incompressible Navier-Stokes simulation, separate subvectors may be used for velocities and pressure, which are combined together into a single MPIManyVector for the overall “solution”.

The above use cases are neither exhaustive nor mutually exclusive, and the NVECTOR_MANYVECTOR implementation should support arbitrary combinations of these cases.

The NVECTOR_MPIMANYVECTOR implementation is designed to work with any NVECTOR subvectors that implement the minimum “standard” set of operations in §9.2.1, however significant performance benefits may be obtained when subvectors additionally implement the optional local reduction operations listed in §9.2.4.

Additionally, NVECTOR_MPIMANYVECTOR sets no limit on the number of subvectors that may be attached (aside from the limitations of using sunindextype for indexing, and standard per-node memory limitations). However, while this ostensibly supports subvectors with one entry each (i.e., one subvector for each solution entry), we anticipate that this extreme situation will hinder performance due to non-stride-one memory accesses and increased function call overhead. We therefore recommend a relatively coarse partitioning of the problem, although actual performance will likely be problem-dependent.

As a final note, in the coming years we plan to introduce additional algebraic solvers and time integration modules that will leverage the problem partitioning enabled by NVECTOR_MPIMANYVECTOR. However, even at present we anticipate that users will be able to leverage such data partitioning in their problem-defining ODE right-hand side function, DAE or nonlinear solver residual function, preconditioners, or custom SUNLinearSolver or SUNNonlinearSolver modules.

## 9.22.1. NVECTOR_MPIMANYVECTOR structure

The NVECTOR_MPIMANYVECTOR implementation defines the content field of N_Vector to be a structure containing the MPI communicator (or MPI_COMM_NULL if running on a single-node), the number of subvectors comprising the MPIManyVector, the global length of the MPIManyVector (including all subvectors on all MPI ranks), a pointer to the beginning of the array of subvectors, and a boolean flag own_data indicating ownership of the subvectors that populate subvec_array.

struct _N_VectorContent_MPIManyVector {
MPI_Comm      comm;            /* overall MPI communicator        */
sunindextype  num_subvectors;  /* number of vectors attached      */
sunindextype  global_length;   /* overall mpimanyvector length    */
N_Vector*     subvec_array;    /* pointer to N_Vector array       */
booleantype   own_data;        /* flag indicating data ownership  */
};


The header file to include when using this module is nvector_mpimanyvector.h. The installed module library to link against is libsundials_nvecmpimanyvector.lib where .lib is typically .so for shared libraries and .a for static libraries.

Note

If SUNDIALS is configured with MPI disabled, then the MPIManyVector library will not be built. Furthermore, any user codes that include nvector_mpimanyvector.h must be compiled using an MPI-aware compiler (whether the specific user code utilizes MPI or not). We note that the NVECTOR_MANYVECTOR implementation is designed for ManyVector use cases in an MPI-unaware environment.

## 9.22.2. NVECTOR_MPIMANYVECTOR functions

The NVECTOR_MPIMANYVECTOR module implements all vector operations listed in §9.2, except for N_VGetArrayPointer(), N_VSetArrayPointer(), N_VScaleAddMultiVectorArray(), and N_VLinearCombinationVectorArray(). As such, this vector cannot be used with the SUNDIALS direct solvers and preconditioners. Instead, the NVECTOR_MPIMANYVECTOR module provides functions to access subvectors, whose data may in turn be accessed according to their NVECTOR implementations.

The names of vector operations are obtained from those in §9.2 by appending the suffix _MPIManyVector (e.g. N_VDestroy_MPIManyVector). The module NVECTOR_MPIMANYVECTOR provides the following additional user-callable routines:

N_Vector N_VNew_MPIManyVector(sunindextype num_subvectors, N_Vector *vec_array, SUNContext sunctx)

This function creates a MPIManyVector from a set of existing NVECTOR objects, under the requirement that all MPI-aware subvectors use the same MPI communicator (this is checked internally). If none of the subvectors are MPI-aware, then this may equivalently be used to describe data partitioning within a single node. We note that this routine is designed to support use cases A and C above.

This routine will copy all N_Vector pointers from the input vec_array, so the user may modify/free that pointer array after calling this function. However, this routine does not allocate any new subvectors, so the underlying NVECTOR objects themselves should not be destroyed before the MPIManyVector that contains them.

Upon successful completion, the new MPIManyVector is returned; otherwise this routine returns NULL (e.g., if two MPI-aware subvectors use different MPI communicators).

Users of the Fortran 2003 interface to this function will first need to use the generic N_Vector utility functions N_VNewVectorArray(), and N_VSetVecAtIndexVectorArray() to create the N_Vector* argument. This is further explained in §2.5.2.5, and the functions are documented in §9.1.1.

N_Vector N_VMake_MPIManyVector(MPI_Comm comm, sunindextype num_subvectors, N_Vector *vec_array, SUNContext sunctx)

This function creates a MPIManyVector from a set of existing NVECTOR objects, and a user-created MPI communicator that “connects” these subvectors. Any MPI-aware subvectors may use different MPI communicators than the input comm. We note that this routine is designed to support any combination of the use cases above.

The input comm should be this user-created MPI communicator. This routine will internally call MPI_Comm_dup to create a copy of the input comm, so the user-supplied comm argument need not be retained after the call to N_VMake_MPIManyVector().

If all subvectors are MPI-unaware, then the input comm argument should be MPI_COMM_NULL, although in this case, it would be simpler to call N_VNew_MPIManyVector() instead, or to just use the NVECTOR_MANYVECTOR module.

This routine will copy all N_Vector pointers from the input vec_array, so the user may modify/free that pointer array after calling this function. However, this routine does not allocate any new subvectors, so the underlying NVECTOR objects themselves should not be destroyed before the MPIManyVector that contains them.

Upon successful completion, the new MPIManyVector is returned; otherwise this routine returns NULL (e.g., if the input vec_array is NULL).

N_Vector N_VGetSubvector_MPIManyVector(N_Vector v, sunindextype vec_num)

This function returns the vec_num subvector from the NVECTOR array.

realtype *N_VGetSubvectorArrayPointer_MPIManyVector(N_Vector v, sunindextype vec_num)

This function returns the data array pointer for the vec_num subvector from the NVECTOR array.

If the input vec_num is invalid, or if the subvector does not support the N_VGetArrayPointer operation, then NULL is returned.

int N_VSetSubvectorArrayPointer_MPIManyVector(realtype *v_data, N_Vector v, sunindextype vec_num)

This function sets the data array pointer for the vec_num subvector from the NVECTOR array.

If the input vec_num is invalid, or if the subvector does not support the N_VSetArrayPointer operation, then -1 is returned; otherwise it returns 0.

sunindextype N_VGetNumSubvectors_MPIManyVector(N_Vector v)

This function returns the overall number of subvectors in the MPIManyVector object.

By default all fused and vector array operations are disabled in the NVECTOR_MPIMANYVECTOR module, except for N_VWrmsNormVectorArray() and N_VWrmsNormMaskVectorArray(), that are enabled by default. The following additional user-callable routines are provided to enable or disable fused and vector array operations for a specific vector. To ensure consistency across vectors it is recommended to first create a vector with N_VNew_MPIManyVector() or N_VMake_MPIManyVector(), enable/disable the desired operations for that vector with the functions below, and create any additional vectors from that vector using N_VClone(). This guarantees that the new vectors will have the same operations enabled/disabled, since cloned vectors inherit those configuration options from the vector they are cloned from, while vectors created with N_VNew_MPIManyVector() and N_VMake_MPIManyVector() will have the default settings for the NVECTOR_MPIMANYVECTOR module. We note that these routines do not call the corresponding routines on subvectors, so those should be set up as desired before attaching them to the MPIManyVector in N_VNew_MPIManyVector() or N_VMake_MPIManyVector().

int N_VEnableFusedOps_MPIManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) all fused and vector array operations in the MPIManyVector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearCombination_MPIManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear combination fused operation in the MPIManyVector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the scale and add a vector to multiple vectors fused operation in the MPIManyVector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableDotProdMulti_MPIManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the multiple dot products fused operation in the MPIManyVector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableLinearSumVectorArray_MPIManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the linear sum operation for vector arrays in the MPIManyVector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableScaleVectorArray_MPIManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the scale operation for vector arrays in the MPIManyVector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableConstVectorArray_MPIManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the const operation for vector arrays in the MPIManyVector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

int N_VEnableWrmsNormVectorArray_MPIManyVector(N_Vector v, booleantype tf)

This function enables (SUNTRUE) or disables (SUNFALSE) the WRMS norm operation for vector arrays in the MPIManyVector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

This function enables (SUNTRUE) or disables (SUNFALSE) the masked WRMS norm operation for vector arrays in the MPIManyVector vector. The return value is 0 for success and -1 if the input vector or its ops structure are NULL.

Notes

• N_VNew_MPIManyVector() and N_VMake_MPIManyVector() set the field own_data = SUNFALSE. N_VDestroy_MPIManyVector() will not attempt to call N_VDestroy() on any subvectors contained in the subvector array for any N_Vector with own_data set to SUNFALSE. In such a case, it is the user’s responsibility to deallocate the subvectors.

• To maximize efficiency, arithmetic vector operations in the NVECTOR_MPIMANYVECTOR implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same subvector representations.

# 9.23. The NVECTOR_MPIPLUSX Module

The NVECTOR_MPIPLUSX module is designed to facilitate the MPI+X paradigm, where X is some form of on-node (local) parallelism (e.g. OpenMP, CUDA). This paradigm is becoming increasingly popular with the rise of heterogeneous computing architectures.

The NVECTOR_MPIPLUSX implementation is designed to work with any NVECTOR that implements the minimum “standard” set of operations in §9.2.1. However, it is not recommended to use the NVECTOR_PARALLEL, NVECTOR_PARHYP, NVECTOR_PETSC, or NVECTOR_TRILINOS implementations underneath the NVECTOR_MPIPLUSX module since they already provide MPI capabilities.

## 9.23.1. NVECTOR_MPIPLUSX structure

The NVECTOR_MPIPLUSX implementation is a thin wrapper around the NVECTOR_MPIMANYVECTOR. Accordingly, it adopts the same content structure as defined in §9.22.1.

The header file to include when using this module is nvector_mpiplusx.h. The installed module library to link against is libsundials_nvecmpiplusx.lib where .lib is typically .so for shared libraries and .a for static libraries.

Note

If SUNDIALS is configured with MPI disabled, then the mpiplusx library will not be built. Furthermore, any user codes that include nvector_mpiplusx.h must be compiled using an MPI-aware compiler.

## 9.23.2. NVECTOR_MPIPLUSX functions

The NVECTOR_MPIPLUSX module adopts all vector operations listed in §9.2, from the NVECTOR_MPIMANYVECTOR (see §9.22) except for N_VGetArrayPointer(), and N_VSetArrayPointer(); the module provides its own implementation of these functions that call the local vector implementations. Therefore, the NVECTOR_MPIPLUSX module implements all of the operations listed in the referenced sections except for N_VScaleAddMultiVectorArray(), and N_VLinearCombinationVectorArray(). Accordingly, it’s compatibility with the SUNDIALS direct solvers and preconditioners depends on the local vector implementation.

The module NVECTOR_MPIPLUSX provides the following additional user-callable routines:

N_Vector N_VMake_MPIPlusX(MPI_Comm comm, N_Vector *local_vector, SUNContext sunctx)

This function creates a MPIPlusX vector from an exisiting local (i.e. on node) NVECTOR object, and a user-created MPI communicator.

The input comm should be this user-created MPI communicator. This routine will internally call MPI_Comm_dup to create a copy of the input comm, so the user-supplied comm argument need not be retained after the call to N_VMake_MPIPlusX().

This routine will copy the NVECTOR pointer to the input local_vector, so the underlying local NVECTOR object should not be destroyed before the mpiplusx that contains it.

Upon successful completion, the new MPIPlusX is returned; otherwise this routine returns NULL (e.g., if the input local_vector is NULL).

N_Vector N_VGetLocal_MPIPlusX(N_Vector v)

This function returns the local vector underneath the MPIPlusX NVECTOR.

realtype *N_VGetArrayPointer_MPIPlusX(N_Vector v)

This function returns the data array pointer for the local vector.

If the local vector does not support the N_VGetArrayPointer() operation, then NULL is returned.

void N_VSetArrayPointer_MPIPlusX(realtype *v_data, N_Vector v)

This function sets the data array pointer for the local vector if the local vector implements the N_VSetArrayPointer() operation.

The NVECTOR_MPIPLUSX module does not implement any fused or vector array operations. Instead users should enable/disable fused operations on the local vector.

Notes

• N_VMake_MPIPlusX() sets the field own_data = SUNFALSE and N_VDestroy_MPIPlusX() will not call N_VDestroy() on the local vector. In this a case, it is the user’s responsibility to deallocate the local vector.

• To maximize efficiency, arithmetic vector operations in the NVECTOR_MPIPLUSX implementation that have more than one N_Vector argument do not check for consistent internal representation of these vectors. It is the user’s responsibility to ensure that such routines are called with N_Vector arguments that were all created with the same subvector representations.

# 9.24. NVECTOR Examples

There are NVECTOR examples that may be installed for eac himplementation. Each implementation makes use of the functions in test_nvector.c. These example functions show simple usage of the NVECTOR family of functions. The input to the examples are the vector length, number of threads (if threaded implementation), and a print timing flag.

The following is a list of the example functions in test_nvector.c:

• Test_N_VClone: Creates clone of vector and checks validity of clone.

• Test_N_VCloneEmpty: Creates clone of empty vector and checks validity of clone.

• Test_N_VCloneVectorArray: Creates clone of vector array and checks validity of cloned array.

• Test_N_VCloneVectorArray: Creates clone of empty vector array and checks validity of cloned array.

• Test_N_VGetArrayPointer: Get array pointer.

• Test_N_VSetArrayPointer: Allocate new vector, set pointer to new vector array, and check values.

• Test_N_VGetLength: Compares self-reported length to calculated length.

• Test_N_VGetCommunicator: Compares self-reported communicator to the one used in constructor; or for MPI-unaware vectors it ensures that NULL is reported.

• Test_N_VLinearSum Case 1a: Test y = x + y

• Test_N_VLinearSum Case 1b: Test y = -x + y

• Test_N_VLinearSum Case 1c: Test y = ax + y

• Test_N_VLinearSum Case 2a: Test x = x + y

• Test_N_VLinearSum Case 2b: Test x = x - y

• Test_N_VLinearSum Case 2c: Test x = x + by

• Test_N_VLinearSum Case 3: Test z = x + y

• Test_N_VLinearSum Case 4a: Test z = x - y

• Test_N_VLinearSum Case 4b: Test z = -x + y

• Test_N_VLinearSum Case 5a: Test z = x + by

• Test_N_VLinearSum Case 5b: Test z = ax + y

• Test_N_VLinearSum Case 6a: Test z = -x + by

• Test_N_VLinearSum Case 6b: Test z = ax - y

• Test_N_VLinearSum Case 7: Test z = a(x + y)

• Test_N_VLinearSum Case 8: Test z = a(x - y)

• Test_N_VLinearSum Case 9: Test z = ax + by

• Test_N_VConst: Fill vector with constant and check result.

• Test_N_VProd: Test vector multiply: z = x * y

• Test_N_VDiv: Test vector division: z = x / y

• Test_N_VScale: Case 1: scale: x = cx

• Test_N_VScale: Case 2: copy: z = x

• Test_N_VScale: Case 3: negate: z = -x

• Test_N_VScale: Case 4: combination: z = cx

• Test_N_VAbs: Create absolute value of vector.

• Test_N_VInv: Compute z[i] = 1 / x[i]

** Test_N_VAddConst: add constant vector: z = c + x

• Test_N_VDotProd: Calculate dot product of two vectors.

• Test_N_VMaxNorm: Create vector with known values, find and validate the max norm.

• Test_N_VWrmsNorm: Create vector of known values, find and validate the weighted root mean square.

• Test_N_VWrmsNormMask: Create vector of known values, find and validate the weighted root mean square using all elements except one.

• Test_N_VMin: Create vector, find and validate the min.

• Test_N_VWL2Norm: Create vector, find and validate the weighted Euclidean L2 norm.

• Test_N_VL1Norm: Create vector, find and validate the L1 norm.

• Test_N_VCompare: Compare vector with constant returning and validating comparison vector.

• Test_N_VInvTest: Test z[i] = 1 / x[i]

• Test_N_VConstrMask: Test mask of vector x with vector c.

• Test_N_VMinQuotient: Fill two vectors with known values. Calculate and validate minimum quotient.

• Test_N_VLinearCombination: Case 1a: Test x = a x

• Test_N_VLinearCombination: Case 1b: Test z = a x

• Test_N_VLinearCombination: Case 2a: Test x = a x + b y

• Test_N_VLinearCombination: Case 2b: Test z = a x + b y

• Test_N_VLinearCombination: Case 3a: Test x = x + a y + b z

• Test_N_VLinearCombination: Case 3b: Test x = a x + b y + c z

• Test_N_VLinearCombination: Case 3c: Test w = a x + b y + c z

• Test_N_VScaleAddMulti: Case 1a: y = a x + y

• Test_N_VScaleAddMulti: Case 1b: z = a x + y

• Test_N_VScaleAddMulti: Case 2a: Y[i] = c[i] x + Y[i], i = 1,2,3

• Test_N_VScaleAddMulti: Case 2b: Z[i] = c[i] x + Y[i], i = 1,2,3

• Test_N_VDotProdMulti: Case 1: Calculate the dot product of two vectors

• Test_N_VDotProdMulti: Case 2: Calculate the dot product of one vector with three other vectors in a vector array.

• Test_N_VLinearSumVectorArray: Case 1: z = a x + b y

• Test_N_VLinearSumVectorArray: Case 2a: Z[i] = a X[i] + b Y[i]

• Test_N_VLinearSumVectorArray: Case 2b: X[i] = a X[i] + b Y[i]

• Test_N_VLinearSumVectorArray: Case 2c: Y[i] = a X[i] + b Y[i]

• Test_N_VScaleVectorArray: Case 1a: y = c y

• Test_N_VScaleVectorArray: Case 1b: z = c y

• Test_N_VScaleVectorArray: Case 2a: Y[i] = c[i] Y[i]

• Test_N_VScaleVectorArray: Case 2b: Z[i] = c[i] Y[i]

• Test_N_VConstVectorArray: Case 1a: z = c

• Test_N_VConstVectorArray: Case 1b: Z[i] = c

• Test_N_VWrmsNormVectorArray: Case 1a: Create a vector of know values, find and validate the weighted root mean square norm.

• Test_N_VWrmsNormVectorArray: Case 1b: Create a vector array of three vectors of know values, find and validate the weighted root mean square norm of each.

• Test_N_VWrmsNormMaskVectorArray: Case 1a: Create a vector of know values, find and validate the weighted root mean square norm using all elements except one.

• Test_N_VWrmsNormMaskVectorArray: Case 1b: Create a vector array of three vectors of know values, find and validate the weighted root mean square norm of each using all elements except one.

• Test_N_VScaleAddMultiVectorArray: Case 1a: y = a x + y

• Test_N_VScaleAddMultiVectorArray: Case 1b: z = a x + y

• Test_N_VScaleAddMultiVectorArray: Case 2a: Y[j][0] = a[j] X[0] + Y[j][0]

• Test_N_VScaleAddMultiVectorArray: Case 2b: Z[j][0] = a[j] X[0] + Y[j][0]

• Test_N_VScaleAddMultiVectorArray: Case 3a: Y[0][i] = a[0] X[i] + Y[0][i]

• Test_N_VScaleAddMultiVectorArray: Case 3b: Z[0][i] = a[0] X[i] + Y[0][i]

• Test_N_VScaleAddMultiVectorArray: Case 4a: Y[j][i] = a[j] X[i] + Y[j][i]

• Test_N_VScaleAddMultiVectorArray: Case 4b: Z[j][i] = a[j] X[i] + Y[j][i]

• Test_N_VLinearCombinationVectorArray: Case 1a: x = a x

• Test_N_VLinearCombinationVectorArray: Case 1b: z = a x

• Test_N_VLinearCombinationVectorArray: Case 2a: x = a x + b y

• Test_N_VLinearCombinationVectorArray: Case 2b: z = a x + b y

• Test_N_VLinearCombinationVectorArray: Case 3a: x = a x + b y + c z

• Test_N_VLinearCombinationVectorArray: Case 3b: w = a x + b y + c z

• Test_N_VLinearCombinationVectorArray: Case 4a: X[0][i] = c[0] X[0][i]

• Test_N_VLinearCombinationVectorArray: Case 4b: Z[i] = c[0] X[0][i]

• Test_N_VLinearCombinationVectorArray: Case 5a: X[0][i] = c[0] X[0][i] + c[1] X[1][i]

• Test_N_VLinearCombinationVectorArray: Case 5b: Z[i] = c[0] X[0][i] + c[1] X[1][i]

• Test_N_VLinearCombinationVectorArray: Case 6a: X[0][i] = X[0][i] + c[1] X[1][i] + c[2] X[2][i]

• Test_N_VLinearCombinationVectorArray: Case 6b: X[0][i] = c[0] X[0][i] + c[1] X[1][i] + c[2] X[2][i]

• Test_N_VLinearCombinationVectorArray: Case 6c: Z[i] = c[0] X[0][i] + c[1] X[1][i] + c[2] X[2][i]

• Test_N_VDotProdLocal: Calculate MPI task-local portion of the dot product of two vectors.

• Test_N_VMaxNormLocal: Create vector with known values, find and validate the MPI task-local portion of the max norm.

• Test_N_VMinLocal: Create vector, find and validate the MPI task-local min.

• Test_N_VL1NormLocal: Create vector, find and validate the MPI task-local portion of the L1 norm.

• Test_N_VWSqrSumLocal: Create vector of known values, find and validate the MPI task-local portion of the weighted squared sum of two vectors.

• Test_N_VWSqrSumMaskLocal: Create vector of known values, find and validate the MPI task-local portion of the weighted squared sum of two vectors, using all elements except one.

• Test_N_VInvTestLocal: Test the MPI task-local portion of z[i] = 1 / x[i]

• Test_N_VConstrMaskLocal: Test the MPI task-local portion of the mask of vector x with vector c.

• Test_N_VMinQuotientLocal: Fill two vectors with known values. Calculate and validate the MPI task-local minimum quotient.

• Test_N_VMBufSize: Tests for accuracy in the reported buffer size.

• Test_N_VMBufPack: Tests for accuracy in the buffer packing routine.

• Test_N_VMBufUnpack: Tests for accuracy in the buffer unpacking routine.