Allow custom BINARY_OP
specializations to be registered at runtime.
#162
Replies: 8 comments 8 replies
-
I think the registered functions should consume the references to the arguments. This pushes work into the client code, but when temporary variables are used, it would allow inplace modification. Given the lengths that NumPy goes to avoid creating temporaries it should be popular with third-parties. |
Beta Was this translation helpful? Give feedback.
-
Another, somewhat related, possibility is to have the compiler clear the lhs in
we would compile to:
(The Doing this would allow us to get rid of the special case for string addition, so shouldn't add any complexity overall. |
Beta Was this translation helpful? Give feedback.
-
We probably want to specialize on builtin, immutable classes only, and on both operands (as we want to specialize for So the guard code would probably look something like: DEOPT_IF(Py_TYPE(lhs)->tp_version_tag != adaptive->version & 255, BINARY_OP);
DEOPT_IF(Py_TYPE(rhs)->tp_version_tag != adaptive->version >> 8, BINARY_OP); We will need to reserve the first 255 versions for these classes. If we want to guarantee that a registered function is always called once registered, then the non-specialized form will need to perform the lookup efficiently. So we will need some sort of hashtable mapping |
Beta Was this translation helpful? Give feedback.
-
FTR, my original HotPy used table lookup for binary operators, to avoid the overhead of tracing the double-dispatch dance for simple types. |
Beta Was this translation helpful? Give feedback.
-
Thinking about this further, we want a design that:
To keep the additional space to a minimum, we want to use an index into a table, rather than a version number and a function pointer. If we are willing to pay additional cost when de-optimizing and creating the We can avoid multiple tests if we use a 64 bit version number in the table. So we want:
The code for the specialized form would look something like: TableEntry *ptr = &TheTable[cache->index];
DEOPT_IF((PyTYPE(a)->tp_version << 32 | PyTYPE(b)->tp_version) != ptr->version_pair);
PyObject *res = ptr->function(a, b); /* Consumes references */
...
} We will also want an ancillary table to look up the index from the version number pair when specializing, but that's much less performance critical. |
Beta Was this translation helpful? Give feedback.
-
Looking at the stats, we should be able to virtually eliminate failed specializations with 20 or 30 entries. |
Beta Was this translation helpful? Give feedback.
-
We can also handle |
Beta Was this translation helpful? Give feedback.
-
Just found this. I'm so sorry for the extremely delayed 😔 response. |
Beta Was this translation helpful? Give feedback.
-
(@markshannon, not sure if this accurately captures your vision for this. Please let me know if not!)
During discussions about the recent unification of all of the
BINARY_*
/INPLACE_*
ops intoBINARY_OP
, the possibility of providing hooks for "custom" specializations (say, for third-party types) briefly came up. I figured we could attempt to flesh out the ideas here a bit more. I think that, in addition to dramatically increasing the flexibility of our specialization machinery, this also has the opportunity to (a) make adding and experimenting with new specializations easier, and (b) clean up our existing operator specializations quite a bit.The way I see it, the bare minimum information required for pluggable operator specializations would include:
This information could be provided to the interpreter using a simple API, that would be called somewhere during interpreter creation (or, in the case of third-party/stdlib modules, during import):
Specializing
BINARY_OP
instructions would just be a matter of consulting these hooks for a match. I'm imagining something like this:The instruction implementation itself would be pretty simple:
I'm not sure if allowing more sophisticated specialization criteria (like different LHS and RHS types) would be worth it, especially considering that all of our existing specializations (except for
BINARY_OP_INPLACE_ADD_UNICODE
) would work perfectly fine with the proposed scheme. It would probably also require an additional cache entry per instruction.It would also be an open question whether it's worth converting our existing
BINARY_OP
specializations to these hooks. If there's no measurable slowdown, I think it would be quite nice to clean up all of the existing special-case logic we have for these specializations.Beta Was this translation helpful? Give feedback.
All reactions