-
Notifications
You must be signed in to change notification settings - Fork 3
Hints for Writing Custom Ops
Andrzej Pronobis edited this page May 5, 2017
·
3 revisions
- If something does not have to be an input, make it an attribute. This makes the graph smaller.
- Bounds checking on the GPU affects the performance, since memcopy must be done. Skip bounds checking in GPU kernels.
Following how gather
is implemented in TensorFlow, we structure the implementation of our custom ops as follows (on the example of gather_columns):
- gather_columns.cc - Definition and registration of the OP.
- gather_columns_functor.h - Declaration of a generic functor template, and definition of the CPU functor template.
- gather_columns_functor.cc - Forward declaration of GPU functor. Needed only if we compile the CPU functor independently. We currently do not do this, and compile everything together (with already compiled GPU functor objects), therefore this file is not strictly necessary.
- gather_columns_functor_gpu.cu.h - Definition of a template of the GPU functor.
- gather_columns_functor_gpu.cu.cc - Specialization of the GPU functor for specific template parameters.