60 namespace threadblock {
    67   typename WarpMmaTensorOp_,
    81   using LayoutC = 
typename WarpMmaTensorOp::LayoutC;
    90     typename WarpMmaTensorOp::Shape,
   102     typename WarpMmaTensorOp::Shape,
   103     typename WarpMmaTensorOp::Policy::Operator::Shape,
   104     typename WarpMmaTensorOp::Policy::Operator::ElementC,
   105     typename WarpMmaTensorOp::Policy::Operator::FragmentC,
   110     typename WarpMmaTensorOp::Shape,
   111     typename WarpMmaTensorOp::Policy::Operator::Shape,
   117     typename OutputTileThreadMap::CompactedThreadMap,
   144 template <
typename Shape_, 
typename WarpMmaTensorOp_, 
int PartitionsK,
   145           typename OutputOp_, 
int ElementsPerAccess, 
int InterleavedK,
   146           bool IsBetaZero = 
false, 
bool isSplitK = 
false>
   155   using LayoutC = 
typename WarpMmaTensorOp::LayoutC;
   172           typename WarpMmaTensorOp::Shape,
   173           typename WarpMmaTensorOp::Policy::Operator::Shape,
   174           typename WarpMmaTensorOp::Policy::Operator::ElementC,
   175           typename WarpMmaTensorOp::Policy::Operator::FragmentC,
 
Describes the size of a matrix tile. 
Definition: matrix_shape.h:42
Templates implementing loading of tiles from pitch-linear rank=2 tensors. 
Definition: aligned_buffer.h:35
typename WarpMmaTensorOp::LayoutC LayoutC
Definition: default_epilogue_tensor_op.h:81
typename OutputOp::ElementOutput ElementOutput
Definition: default_epilogue_tensor_op.h:80
Epilogue for threadblock scoped GEMMs using Tensor Ops. 
Epilogue operator without splitk. 
Definition: interleaved_epilogue.h:79
WarpMmaTensorOp_ WarpMmaTensorOp
Definition: default_epilogue_tensor_op.h:75
Defines common types used for all GEMM-like operators. 
Functor performing conversion operations used by epilogues. 
static int const kPartitionsK
Definition: default_epilogue_tensor_op.h:76
OutputOp_ OutputOp
Definition: default_epilogue_tensor_op.h:151
WarpMmaTensorOp_ WarpMmaTensorOp
Definition: default_epilogue_tensor_op.h:149
This defines a "fragment" iterator for visiting the fragments of an accumulator tile that participate...
cutlass::epilogue::threadblock::PredicatedTileIterator< OutputTileThreadMap, ElementOutput > OutputTileIterator
Definition: default_epilogue_tensor_op.h:99
typename WarpMmaTensorOp::ElementC ElementAccumulator
Definition: default_epilogue_tensor_op.h:82
Statically sized array of elements that accommodates all CUTLASS-supported numeric types and is safe ...
Defines the optimal thread map for TensorOp accumulator layouts. 
Definition: default_thread_map_tensor_op.h:104
Shape_ Shape
Definition: default_epilogue_tensor_op.h:74
Functor performing linear combination operations used by epilogues. 
Defines the size of an element in bits. 
Definition: numeric_types.h:42
typename WarpMmaTensorOp::LayoutC LayoutC
Definition: default_epilogue_tensor_op.h:155
typename WarpMmaTensorOp::ElementC ElementAccumulator
Definition: default_epilogue_tensor_op.h:156
Defines the optimal thread map for TensorOp accumulator layouts. 
Definition: default_thread_map_tensor_op.h:52
Top-level include for all CUTLASS numeric types. 
Template for reading and writing tiles of accumulators to shared memory. 
Definition: tile_iterator_tensor_op.h:52
cutlass::epilogue::threadblock::SharedLoadIterator< typename OutputTileThreadMap::CompactedThreadMap, ElementAccumulator > SharedLoadIterator
Definition: default_epilogue_tensor_op.h:119
Definition: epilogue/threadblock/predicated_tile_iterator.h:452
Epilogue for threadblock scoped GEMMs using Tensor Ops. 
Definition: fragment_iterator_tensor_op.h:61
typename OutputOp::ElementOutput ElementOutput
Definition: default_epilogue_tensor_op.h:154
Epilogue operator without splitk. 
Definition: epilogue.h:74
Epilogue for threadblock scoped GEMMs using Tensor Ops. 
Definition: epilogue/threadblock/predicated_tile_iterator.h:65
cutlass::epilogue::warp::TileIteratorTensorOp< typename WarpMmaTensorOp::Shape, typename WarpMmaTensorOp::Policy::Operator::Shape, ElementAccumulator, LayoutC > WarpTileIterator
Definition: default_epilogue_tensor_op.h:114
Definition: default_epilogue_tensor_op.h:147
typename cutlass::epilogue::threadblock::DefaultInterleavedThreadMapTensorOp< Shape, typename WarpMmaTensorOp::Shape, kPartitionsK, ElementOutput, kElementsPerAccess, InterleavedK >::Type OutputTileThreadMap
Definition: default_epilogue_tensor_op.h:164
Definition: shared_load_iterator.h:61
typename cutlass::epilogue::threadblock::DefaultThreadMapTensorOp< Shape, typename WarpMmaTensorOp::Shape, kPartitionsK, ElementOutput, kElementsPerAccess >::Type OutputTileThreadMap
Definition: default_epilogue_tensor_op.h:94
Defines sensible defaults for epilogues for TensorOps. 
Definition: default_epilogue_tensor_op.h:72
cutlass::epilogue::warp::FragmentIteratorTensorOp< typename WarpMmaTensorOp::Shape, typename WarpMmaTensorOp::Policy::Operator::Shape, typename WarpMmaTensorOp::Policy::Operator::ElementC, typename WarpMmaTensorOp::Policy::Operator::FragmentC, LayoutC > AccumulatorFragmentIterator
Definition: default_epilogue_tensor_op.h:107
Functor performing reduction operations used by epilogues. 
Shape_ Shape
Definition: default_epilogue_tensor_op.h:148
Basic include for CUTLASS. 
static int const kElementsPerAccess
Definition: default_epilogue_tensor_op.h:78
OutputOp_ OutputOp
Definition: default_epilogue_tensor_op.h:77
Epilogue for threadblock scoped GEMMs using Tensor Ops.