-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant sequential filling operation needs output shape parameter #492
Comments
Could the parameters |
So the final proposal is like this: constant(start, step, operand_descriptor), right (I guess the order of parameters is like this)? @huningxin @fdwr WDYT? |
I'm fine with it either way. If you as a caller already have shape and data type variables, then requiring a temporary tensor desc is a little more bothersome than just passing them directly as parameters, but if you already have a tensor desc, then passing that directly is slightly more convenient. 🤷♂️ Since the internal implementation complexity doesn't really change either way, maybe asking someone from the WebNN API caller point of view (like Wanming from ORT) would be a nice tie breaker? Whatever we do, we should use the same pattern in the future with similar operators. So if we introduce, for example, a diagonal matrix operator that fills a new tensor with 1's along the diagonal and 0's everywhere else, then it should take a tensordesc or two separate parameters. I don't see that we have much existing precedent though. Any ops that take a shape currently (like reshape or expand) take the shape as a separate parameter, and ops that take a data type take them as a separate parameter (like cast), but we don't have any others that take both shape and data type. |
No much impact to WebNN EP, I am fine with it either way as well. |
In Web APIs they usually prefer passing options in dictionaries. |
Well these are essential parameters, not options, but it would be more consistent with the existing
Usage:
If it causes no ambiguity in the IDL, then I'd like both overload parameters to go in the same order, rather than inconsistently reversed between them. Also if you make |
I think marking fields are The following is also fine IMO. dictionary FillSequenceDescriptor {
required double start;
double step = 0;
};
constant(MLOperandDescriptor desc, FillSequenceDescriptor fillSequence); The type we use for How to support for int64 remains an open question :) |
One of the three current
...and so there's precedent for passing at least that much precision. I agree that float32 is not sufficient to cover the accuracy needed (it cannot even represent odd int32 values >= 16777216 accurately).
One operator (split) already accepts parameters of differing types:
So (barring any IDL issues I'm not seeing?) we could accept
Curiously I see a |
I thought of the same method as you. And I am trying to add bigint to verify the feasibility of the implementation. |
Updates to the constant sequential filling operation:
On one hand, we can continue to review these two CLs to find potential problems and verify feasibility. On the other hand, can we first create an updated specification PR which matches what is being implemented and get folks with more WebIDL and JavaScript API design knowledge to take a look at it to make sure it will work the way we intend it to as Reilly said? WDYT @huningxin @fdwr @wacky6 |
Drive-by comment: Use of bigint is extremely limited in Chromium at the moment, so I'm not surprised if the binding code generation can't handle a union with it yet. Note also this in WebIDL: https://webidl.spec.whatwg.org/#limit-bigint-numeric-unions - while WebIDL support for a union of bigint and another numeric type is desired , there is a request to file an issue so that the problem can be discussed. And... from that bug tracker, it doesn't look like anyone has actually done it yet. |
https://webidl.spec.whatwg.org/#limit-bigint-numeric-unions
Joshua: I see. Well the point in this case is to preserve values all the way through and avoid precision errors. I see one existing issue here whatwg/webidl#959 (but oddly empty and already closed). |
Yeah, I think the concern (from the WebIDL editors) is that they want to ensure that it's going to be used correctly (as proposed here) with handling at the C++ (etc) layer that preserves full precision either as a float64 or bigint. The WebIDL editors just don't want to do extra work unless it's really needed, and don't want anyone writing such a union unless the precision is actually needed. It absolutely is here... this may just be the first spec where that's actually the case! 🤯 |
Comparing the blink bindings implementation for unions and the spec for union conversions, it looks like the cases for bigint are simply missing i.e. not implemented yet. |
We have some parameter validation discussions happening in the fill-sequence constant overload CR. I'm content to restrict this sequential fill overload to reject infinity as an input (to simplify parameter validation) so long as we have some other means of creating tensors filled with such values; and I propose we repurpose the existing |
We're deciding between (a) vs (b): (a) 3 separate functions, one consuming buffers, one for constant fills, one for sequence fills:
➕ more direct (b) 2 functions, one consuming buffers, and one that takes an options object with optional params:
Credit @a-sully ➕ pretty immediately clear to the reader, without needing to lookup whether parameter 2 or 3 is the step. |
This seems nice from caller's perspective. But I think we'll need hand-written parsing logic. I don't think WebIDL binding generator is smart enough to tell different kinds of dictionaries apart. dictionary FillConstant{
fill: MLNumber // or `start`
};
dictionary FillWithSequence {
start: MLNumber
step: MLNumber
} |
Yup! That recommendation is based on guidance here: https://w3ctag.github.io/design-principles/#prefer-dictionaries
And my suggestion for combining the "fill with a given value" and "fill with this value + a step" methods into one is based on the guidance here: https://w3ctag.github.io/design-principles/#optional-parameters If the dictionary argument itself is optional, then just this is sufficient:
I'm not sure, though as @fdwr mentioned above I think if we use |
Taking a step back and looking at the problem statement described in the issue description, I'd like to poke at this claim:
Let's first look at the Problematic Library differences:
PyTorch seems to have resolved this by deprecating So there's no problematic library difference, right?... ...What about DirectML?! DirectML does not require an explicit Infinite
|
Taking an even bigger step back, I was curious what the rationale for adding this operator (which I'll refer to as
In light of the recent discussions on #614 about achieving minimal buffer copies by using
Passing an
Questions for the group:
†Hopefully 🤞 Pending further |
For some MLGraphBuilder methods the type of a numeric input can vary - e.g. for constant() an explicit MLOperandDataType is provided; for clamp() and pad() the data type is implied by input operands. In these cases, specifying the numeric value as either a float/double or int64 type runs into accuracy or range issues - you can't accurately represent all int64 values as a double, and you can't represent the full range of floats as int64. (You also can't represent all int64 values as an long long either - over 2^53 things get wierd. But that's a digression.) Per discussion in whatwg/webidl#1388 this change introduces a union between a bigint type and unrestricted double called MLNumber. Callers can pass a JS number (1234, 1.1234e38) or a JS bigint (9007199254740993n), and the implementation will treat it properly based on the explicit or implicit MLOperandDataType. Usage of this type should be limited to only those cases. Fixes webmachinelearning#442 Note that webmachinelearning#492 proposes changes to the constant sequential filling operation; this just adds IDL to match the current spec prose. Some of the concerns raised in webmachinelearning#325 are addressed (e.g. clamp()'s options). However, several other options are still specified as "float", and should maybe be "double" - but MLNumber is likely not appropriate for those, so they are not updated here.
My biggest concern is deducing tensors shapes from wobbly floating point calculations, especially considering the caller (maybe Javascript, maybe compiled WebAssembly, maybe TF with one rounding behavior, maybe PT with another rounding behavior...) and the browser (with its own FPU details) can differ (it would be bad if the browser computed its range subtraction and delta division via float64 whereas the caller used float32, or vice versa), and a mismatch between the output shape can result in the whole graph execution breaking due to incompatible shapes when combined with other operators. High-level frameworks like TF Lite and PyTorch can do they want and include such helpful functions based solely on FP parameters, but low-level API's like WebNN should take more explicit tensor shapes.
🤔 Seems not so. I picked nice integer numbers for the example above, but more generically, if you pass the same values into import torch
import tensorflow
import random
ptRange = torch.arange( 0.3908395899830157, 1.9541979499150786, 0.3908395899830157, dtype=torch.float32)
tfRange = tensorflow.range(0.3908395899830157, 1.9541979499150786, 0.3908395899830157, dtype=tensorflow.float32)
print("---------------------------------")
print("PT shape: ", ptRange.shape)
print(" dtype: ", ptRange.dtype)
print(" value: ", ptRange)
print("TF shape: ", tfRange.shape)
print(" dtype: ", tfRange.dtype)
print(" value: ", tfRange)
# PT shape: torch.Size([4])
# dtype: torch.float32
# value: tensor([0.3908, 0.7817, 1.1725, 1.5634])
# TF shape: (5,)
# dtype: <dtype: 'float32'>
# value: tf.Tensor([0.39083958 0.78167915 1.1725187 1.5633583 1.9541979 ], shape=(5,), dtype=float32) I'd sooner drop this op altogether (and force the caller to fill in the Float32Array) than include it with FP deduced shape.
(update 20240419T0320) I opened a doc fix for the pseudocode. The actual DML output is |
Thanks again for the deep dive @fdwr :) Just to clarify a couple points...
If we went down this route we would clearly specify the behavior of the browser - say, to always be float64. So if frameworks are doing their parallel calculations in some other data type, that's a bug in the framework and not something we (as builders of the web platform) should be too worried about. Of course we should strive to design the API such that callers fall into the pit of success. But there's a difference between an API which is unintuitive to use (say, because the parameters are ordered differently from the analogous function you're used to in Python) and one which cannot reliably be used correctly_ (say, if the browser sometimes computes shapes in float32 and other times in float64, and you have no way to know which will be used). I think the "FP deduced shape" API we're discussing here is the former. But it sounds like (based on comments like the one below) you're suggesting that that may not be realistic, given the discrepancies in the underlying frameworks?
If that's true, I concur:
Recalling the questions I posed to the group earlier...
Unless there are very compelling answers to 2 and 3, I'm in favor of removing this operator. WDYT? |
@a-sully : I had to analyze some models deeper first before I could reply. Looking through ~700 models on my mini-collection with 19 having
If we encounter a more compelling case later where a large fill directly on the GPU is worthwhile, then we can consider re-adding fillSequence (with explicit shape parameter, and only computing once rather than each |
The above PR merged. Let's close this issue? |
See
constant(start, end, step, type)
at https://www.w3.org/TR/webnn/#dom-mlgraphbuilder-constant-start-end-step-typeProblem
Computing the output shape indirectly from a floating-point calculation is brittle because the caller, depending on its equation and rounding, can yield a different size from WebNN which will cause graph execution to fail due to mismatched tensors. Such incompatible off-by-one differences can occur via differing calculation precisions (e.g. the caller computes the output shape in float64 whereas Chromium computes it in float32) and via differing handling of the inclusive/exclusive end value (see a non-hypothetical example below). So it's prudent for the library to pass WebNN the shape explicitly, and we've increasingly been moving toward more explicitness anyway (e.g. removing the special null values in Reshape). The output shape was in the
fillSequence
operator, but got dropped, and it reoccurred to me while reviewing Bin's Chromium implementation.Proposal
* Note the
end
is now redundant, since it was only used for size calculation, not needed for the corestart + (index * step)
formula.Problematic Library differences
We see differences between TF and PT with the exact same inputs due to how they handle the end result (included vs excluded), which emphasizes why WebNN should be clearly informed by the caller of the expected shape:
Usage Examples
@wchao1115 @huningxin
The text was updated successfully, but these errors were encountered: