starpu_kernels.c

Summary
starpu_kernels.c
getrfsp1d_starpu_cpuDiagonal block factorization and column block update for LU decomposition.
getrfsp1d_gemm_starpu_commonGeneral LU update of left block column facing current block.
getrfsp1d_gemm_starpu_cpuGeneral LU update of left block column facing current block.
getrfsp1d_gemm_starpu_cudaGeneral LU update of left block column facing current block.
getrfsp1d_gemm_starpu_commonSparse LU update of left block column facing current block.
getrfsp1d_gemm_starpu_cpuSparse LU update of left block column facing current block.
getrfsp1d_sparse_gemm_starpu_cpu
getrfsp1d_gemm_starpu_cudaSparse LU update of left block column facing current block.
getrfsp1d_sparse_gemm_starpu_cuda
potrfsp1d_starpu_cpuDiagonal block factorization and column block update for LLt decomposition.
potrfsp1d_gemm_starpu_cpuGeneral LLt update of left block column facing current block.
potrfsp1d_gemm_starpu_commonSparse LLt update of left block column facing current block.
potrfsp1d_gemm_starpu_cpuSparse LLt update of left block column facing current block.
potrfsp1d_sparse_gemm_starpu_cpu
potrfsp1d_gemm_starpu_cudaSparse LLt update of left block column facing current block.
potrfsp1d_sparse_gemm_starpu_cuda
hetrfsp1d_starpu_cpuDiagonal block factorization and column block update for LDLt decomposition.
hetrfsp1d_gemm_starpu_cpuGeneral LDLt update of left block column facing current block.

getrfsp1d_starpu_cpu

Diagonal block factorization and column block update for LU decomposition.

Parameters

buffersData handlers :
0Pivot counter
1L column block
2U column block
_argsCodelet arguments:
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.

getrfsp1d_gemm_starpu_common

General LU update of left block column facing current block.

Common function for CPU and GPU.

Update block by block.

Parameters

buffersData handlers :
0L column block.
1L facing column block.
2U column block.
3U facing column block.
4Working memory area.
_argscodelet arguments :
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.
bloknumCurrent block index.
fcblknumFacing column block index.
archindicate if the codelet is runned on CPU or CUDA node.

getrfsp1d_gemm_starpu_cpu

void getrfsp1d_gemm_starpu_cpu(void *buffers[],
void *_args)

General LU update of left block column facing current block.

Update block by block.

Parameters

buffersData handlers :
0L column block.
1L facing column block.
2U column block.
3U facing column block.
4Working memory area.
_argscodelet arguments :
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.
bloknumCurrent block index.
fcblknumFacing column block index.

getrfsp1d_gemm_starpu_cuda

void getrfsp1d_gemm_starpu_cuda(void *buffers[],
void *_args)

General LU update of left block column facing current block.

Update block by block.

Parameters

buffersData handlers :
0L column block.
1L facing column block.
2U column block.
3U facing column block.
4Working memory area.
_argscodelet arguments :
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.
bloknumCurrent block index.
fcblknumFacing column block index.

getrfsp1d_gemm_starpu_common

Sparse LU update of left block column facing current block.

Common function for CPU and GPU.

Update all the facing column block at once.

TODO: Implement the CPU version

Parameters

buffersData handlers :
0L column block.
1L facing column block.
2U column block.
3U facing column block.
4Working memory area.
5blocktab (depending on -DSTARPU_BLOCKTAB_SELFCOPY).
_argscodelet arguments :
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.
bloknumCurrent block index.
fcblknumFacing column block index.
nblocsNumber of blocks in current column block.
d_blocktabpointers on blocktabs for CUDA nodes (-DSTARPU_BLOCKTAB_SELFCOPY).
archindicate if the codelet is runned on CPU or CUDA node.

getrfsp1d_gemm_starpu_cpu

Sparse LU update of left block column facing current block.

Update all the facing column block at once.

Parameters

buffersData handlers :
0L column block.
1L facing column block.
2U column block.
3U facing column block.
4Working memory area.
5blocktab (depending on -DSTARPU_BLOCKTAB_SELFCOPY).
_argscodelet arguments
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.
bloknumCurrent block index.
fcblknumFacing column block index.
nblocsNumber of blocks in current column block.
d_blocktabpointers on blocktabs for CUDA nodes (-DSTARPU_BLOCKTAB_SELFCOPY).

getrfsp1d_sparse_gemm_starpu_cpu

void getrfsp1d_sparse_gemm_starpu_cpu(void *buffers[],
void *_args)

getrfsp1d_gemm_starpu_cuda

Sparse LU update of left block column facing current block.

Update all the facing column block at once.

Parameters

buffersData handlers :
0L column block.
1L facing column block.
2U column block.
3U facing column block.
4Working memory area.
5blocktab (depending on -DSTARPU_BLOCKTAB_SELFCOPY).
_argscodelet arguments :
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.
bloknumCurrent block index.
fcblknumFacing column block index.
nblocsNumber of blocks in current column block.
d_blocktabpointers on blocktabs for CUDA nodes (-DSTARPU_BLOCKTAB_SELFCOPY).

getrfsp1d_sparse_gemm_starpu_cuda

void getrfsp1d_sparse_gemm_starpu_cuda(void *buffers[],
void *_args)

potrfsp1d_starpu_cpu

void potrfsp1d_starpu_cpu(void *buffers[],
void *_args)

Diagonal block factorization and column block update for LLt decomposition.

Parameters

buffersData handlers :
0Pivot counter
1L column block
_argsCodelet arguments:
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.

potrfsp1d_gemm_starpu_cpu

void potrfsp1d_gemm_starpu_cpu(void *buffers[],
void *_args)

General LLt update of left block column facing current block.

Update block by block.

Parameters

buffersData handlers :
0L column block.
1L facing column block.
2Working memory area.
_argscodelet arguments :
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.
bloknumCurrent block index.
fcblknumFacing column block index.

potrfsp1d_gemm_starpu_common

Sparse LLt update of left block column facing current block.

Common function for CPU and GPU.

Update all the facing column block at once.

TODO: Implement the CPU version

Parameters

buffersData handlers :
0L column block.
1L facing column block.
2Working memory area.
3blocktab (depending on -DSTARPU_BLOCKTAB_SELFCOPY).
_argscodelet arguments :
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.
bloknumCurrent block index.
fcblknumFacing column block index.
nblocsNumber of blocks in current column block.
d_blocktabpointers on blocktabs for CUDA nodes (-DSTARPU_BLOCKTAB_SELFCOPY).
archindicate if the codelet is runned on CPU or CUDA node.

potrfsp1d_gemm_starpu_cpu

Sparse LLt update of left block column facing current block.

Update all the facing column block at once.

Parameters

buffersData handlers :
0L column block.
1L facing column block.
2Working memory area.
3blocktab (depending on -DSTARPU_BLOCKTAB_SELFCOPY).
_argscodelet arguments :
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.
bloknumCurrent block index.
fcblknumFacing column block index.
nblocsNumber of blocks in current column block.
d_blocktabpointers on blocktabs for CUDA nodes (-DSTARPU_BLOCKTAB_SELFCOPY).

potrfsp1d_sparse_gemm_starpu_cpu

void potrfsp1d_sparse_gemm_starpu_cpu(void *buffers[],
void *_args)

potrfsp1d_gemm_starpu_cuda

Sparse LLt update of left block column facing current block.

Update all the facing column block at once.

Parameters

buffersData handlers :
0L column block.
1L facing column block.
2Working memory area.
3blocktab (depending on -DSTARPU_BLOCKTAB_SELFCOPY).
_argscodelet arguments :
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.
bloknumCurrent block index.
fcblknumFacing column block index.
nblocsNumber of blocks in current column block.
d_blocktabpointers on blocktabs for CUDA nodes (-DSTARPU_BLOCKTAB_SELFCOPY).

potrfsp1d_sparse_gemm_starpu_cuda

void potrfsp1d_sparse_gemm_starpu_cuda(void *buffers[],
void *_args)

hetrfsp1d_starpu_cpu

void hetrfsp1d_starpu_cpu(void *buffers[],
void *_args)

Diagonal block factorization and column block update for LDLt decomposition.

Parameters

buffersData handlers :
0Pivot counter
1L column block
2Will receive L*DIAG(BLOCK_DIAG(L))
_argsCodelet arguments:
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.

hetrfsp1d_gemm_starpu_cpu

void hetrfsp1d_gemm_starpu_cpu(void *buffers[],
void *_args)

General LDLt update of left block column facing current block.

Common function for CPU and GPU.

Update block by block.

Parameters

buffersData handlers :
0L column block.
1L facing column block.
2Working memory area.
_argscodelet arguments :
sopalin_dataglobal PaStiX internal data.
cblknumCurrent column block index.
bloknumCurrent block index.
fcblknumFacing column block index.
void getrfsp1d_gemm_starpu_cpu(void *buffers[],
void *_args)
General LU update of left block column facing current block.
void getrfsp1d_gemm_starpu_cuda(void *buffers[],
void *_args)
General LU update of left block column facing current block.
void getrfsp1d_sparse_gemm_starpu_cpu(void *buffers[],
void *_args)
void getrfsp1d_sparse_gemm_starpu_cuda(void *buffers[],
void *_args)
void potrfsp1d_starpu_cpu(void *buffers[],
void *_args)
Diagonal block factorization and column block update for LLt decomposition.
void potrfsp1d_gemm_starpu_cpu(void *buffers[],
void *_args)
General LLt update of left block column facing current block.
void potrfsp1d_sparse_gemm_starpu_cpu(void *buffers[],
void *_args)
void potrfsp1d_sparse_gemm_starpu_cuda(void *buffers[],
void *_args)
void hetrfsp1d_starpu_cpu(void *buffers[],
void *_args)
Diagonal block factorization and column block update for LDLt decomposition.
void hetrfsp1d_gemm_starpu_cpu(void *buffers[],
void *_args)
General LDLt update of left block column facing current block.
Close