Skip to content

Hints for Writing Custom Ops

Andrzej Pronobis edited this page May 5, 2017 · 3 revisions
  • If something does not have to be an input, make it an attribute. This makes the graph smaller.
  • Bounds checking on the GPU affects the performance, since memcopy must be done. Skip bounds checking in GPU kernels.

Structure of a Custom of Implementation

Following how gather is implemented in TensorFlow, we structure the implementation of our custom ops as follows (on the example of gather_columns):

  • gather_columns.cc - Definition and registration of the OP.
  • gather_columns_functor.h - Declaration of a generic functor template, and definition of the CPU functor template.
  • gather_columns_functor.cc - Forward declaration of GPU functor. Needed only if we compile the CPU functor independently. We currently do not do this, and compile everything together (with already compiled GPU functor objects), therefore this file is not strictly necessary.
  • gather_columns_functor_gpu.cu.h - Definition of a template of the GPU functor.
  • gather_columns_functor_gpu.cu.cc - Specialization of the GPU functor for specific template parameters.
Clone this wiki locally