|  | CUTLASS
    CUDA Templates for Linear Algebra Subroutines and Solvers | 

| Directories | |
| directory | kernel | 
| directory | thread | 
| Files | |
| file | batched_reduction.h [code] | 
| Implements a software-pipelined efficient batched reduction. D = alpha * Reduction(A) + beta * C. | |
| file | batched_reduction_traits.h [code] | 
| Defines structural properties of complete batched reduction. D = alpha * Reduction(A) + beta * C. | |
| file | reduction/threadblock_swizzle.h [code] | 
| Defies functors for mapping blockIdx to partitions of the batched reduction computation. | |
 1.8.11
 1.8.11