Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with fixes of aca33f17 (Oct 22, needs LLVM Oct 19) (92) #476

Merged
merged 33 commits into from
Jan 30, 2025

Conversation

mgehre-amd
Copy link
Collaborator

@mgehre-amd mgehre-amd commented Jan 24, 2025

Requires llvm bump

commit 02bf3b54c026
Author: Felix Schneider <[email protected]>
Date:   Sat Oct 19 18:25:27 2024 +0200

    [mlir][linalg] Add quantized conv2d operator with FCHW,NCHW order (#107740)

ubfx and others added 14 commits October 22, 2024 20:26
…llvm#3807)

I've upstreamed the necessary quantized linalg Op with the
"channel-first" ordering used by torch
(llvm/llvm-project#107740) for 2d convolution.

This patch changes the lowering for the quantized 2d case of
`aten.convolution` accordingly, which saves three transpositions per
convolution (input, weights, result) and therefore removes the
requirement to try to optimize these away in downstream passes.
…lculate` region (llvm#3812)

Reports a match failure for the pattern `FullyUnrollPrimLoop` when the
loop op is not in a region defined by a `torch.shape.calculate` op.

This is needed to avoid unrolling prim loops generated by ONNX IR, since
we are applying shape refinement in the
`torch-onnx-to-torch-backend-pipeline` introduced in fa4794d .

See also the discussion in
<iree-org/iree#18867 (comment)>
As discussed in llvm#3652, we should
replace maxpool3dwithindices with maxpool3d if indices have no user.
Added autopad. and passed 3 tests 

test_maxpool_2d_precomputed_same_upper
test_maxpool_2d_same_lower'
test_maxpool_2d_same_upper

Address : nod-ai/SHARK-ModelDev#843 

2 attributes yet to complete : storage_order, indices output
### Changes

1. Folders for view-like ops: `aten.view`, `aten.flatten.using_ints`,
and `aten.unflatten.int`
2. Folder for transpose
3. Extended support for the `aten.slice.Tensor` op folder to include
negative strides.


### Motivation

The biggest motivation for this patch is to fold the extremely
convoluted ir that gets generated when exporting a pytorch model with an
`aten.pad` op to ONNX, then re-importing and lowering back to torch. For
example, the verbose output of the e2e test `PadModule_basic` with `-c
onnx`:

```mlir
module {
  func.func @main_graph(%arg0: !torch.vtensor<[?,?,?,?],f32>) -> !torch.vtensor<[?,?,?,?],f32> attributes {torch.onnx_meta.ir_version = 9 : si64, torch.onnx_meta.opset_version = 20 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "2.5.0"} {
    %none = torch.constant.none
    %0 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<_> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> 
    %1 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__1> : tensor<4xsi64>} : () -> !torch.vtensor<[4],si64> 
    %2 = torch.operator "onnx.ConstantOfShape"(%0) {torch.onnx.value = dense_resource<__2> : tensor<1xsi64>} : (!torch.vtensor<[1],si64>) -> !torch.vtensor<[4],si64> 
    %3 = torch.operator "onnx.Concat"(%1, %2) {torch.onnx.axis = 0 : si64} : (!torch.vtensor<[4],si64>, !torch.vtensor<[4],si64>) -> !torch.vtensor<[8],si64> 
    %4 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__3> : tensor<2xsi64>} : () -> !torch.vtensor<[2],si64> 
    %5 = torch.operator "onnx.Reshape"(%3, %4) {torch.onnx.allowzero = 0 : si64} : (!torch.vtensor<[8],si64>, !torch.vtensor<[2],si64>) -> !torch.vtensor<[4,2],si64> 
    %6 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__4> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> 
    %7 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__5> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> 
    %8 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__6> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> 
    %9 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__7> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> 
    %10 = torch.operator "onnx.Slice"(%5, %7, %8, %6, %9) : (!torch.vtensor<[4,2],si64>, !torch.vtensor<[1],si64>, !torch.vtensor<[1],si64>, !torch.vtensor<[1],si64>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[4,2],si64> 
    %11 = torch.operator "onnx.Transpose"(%10) {torch.onnx.perm = [1 : si64, 0 : si64]} : (!torch.vtensor<[4,2],si64>) -> !torch.vtensor<[2,4],si64> 
    %12 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__8> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> 
    %13 = torch.operator "onnx.Reshape"(%11, %12) {torch.onnx.allowzero = 0 : si64} : (!torch.vtensor<[2,4],si64>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[8],si64> 
    %14 = torch.operator "onnx.Cast"(%13) {torch.onnx.to = 7 : si64} : (!torch.vtensor<[8],si64>) -> !torch.vtensor<[8],si64> 
    %15 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__9> : tensor<f32>} : () -> !torch.vtensor<[],f32> 
    %16 = torch.operator "onnx.Pad"(%arg0, %14, %15) {torch.onnx.mode = "constant"} : (!torch.vtensor<[?,?,?,?],f32>, !torch.vtensor<[8],si64>, !torch.vtensor<[],f32>) -> !torch.vtensor<[?,?,?,?],f32> 
    return %16 : !torch.vtensor<[?,?,?,?],f32>
  }
}

{-#
  dialect_resources: {
    builtin: {
      _: "0x080000000400000000000000",
      __1: "0x080000000000000000000000010000000000000002000000000000000300000000000000",
      __2: "0x080000000000000000000000",
      __3: "0x08000000FFFFFFFFFFFFFFFF0200000000000000",
      __4: "0x080000000000000000000000",
      __5: "0x08000000FFFFFFFFFFFFFFFF",
      __6: "0x080000000100000000000080",
      __7: "0x08000000FFFFFFFFFFFFFFFF",
      __8: "0x08000000FFFFFFFFFFFFFFFF",
      __9: "0x080000000000C03F"
    }
  }
#-}
```

Get's converted to the torch IR:

```mlir
module {
  func.func @main_graph(%arg0: !torch.vtensor<[?,?,?,?],f32>) -> !torch.vtensor<[?,?,?,?],f32> attributes {torch.onnx_meta.ir_version = 9 : si64, torch.onnx_meta.opset_version = 20 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "2.5.0"} {
    %float1.500000e00 = torch.constant.float 1.500000e+00
    %int-9223372036854775807 = torch.constant.int -9223372036854775807
    %int-1 = torch.constant.int -1
    %int7 = torch.constant.int 7
    %int6 = torch.constant.int 6
    %int5 = torch.constant.int 5
    %int3 = torch.constant.int 3
    %int8 = torch.constant.int 8
    %int1 = torch.constant.int 1
    %int2 = torch.constant.int 2
    %int4 = torch.constant.int 4
    %int0 = torch.constant.int 0
    %0 = torch.vtensor.literal(dense<[0, 1, 2, 3, 0, 0, 0, 0]> : tensor<8xsi64>) : !torch.vtensor<[8],si64>
    %1 = torch.prim.ListConstruct %int4, %int2 : (!torch.int, !torch.int) -> !torch.list<int>
    %2 = torch.aten.view %0, %1 : !torch.vtensor<[8],si64>, !torch.list<int> -> !torch.vtensor<[4,2],si64>
    %3 = torch.aten.slice.Tensor %2, %int0, %int-1, %int-9223372036854775807, %int-1 : !torch.vtensor<[4,2],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[4,2],si64>
    %4 = torch.aten.transpose.int %3, %int0, %int1 : !torch.vtensor<[4,2],si64>, !torch.int, !torch.int -> !torch.vtensor<[2,4],si64>
    %5 = torch.prim.ListConstruct %int-1 : (!torch.int) -> !torch.list<int>
    %6 = torch.aten.view %4, %5 : !torch.vtensor<[2,4],si64>, !torch.list<int> -> !torch.vtensor<[8],si64>
    %7 = torch.aten.slice.Tensor %6, %int0, %int0, %int1, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64>
    %8 = torch.aten.item %7 : !torch.vtensor<[1],si64> -> !torch.int
    %9 = torch.aten.slice.Tensor %6, %int0, %int1, %int2, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64>
    %10 = torch.aten.item %9 : !torch.vtensor<[1],si64> -> !torch.int
    %11 = torch.aten.slice.Tensor %6, %int0, %int2, %int3, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64>
    %12 = torch.aten.item %11 : !torch.vtensor<[1],si64> -> !torch.int
    %13 = torch.aten.slice.Tensor %6, %int0, %int3, %int4, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64>
    %14 = torch.aten.item %13 : !torch.vtensor<[1],si64> -> !torch.int
    %15 = torch.aten.slice.Tensor %6, %int0, %int4, %int5, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64>
    %16 = torch.aten.item %15 : !torch.vtensor<[1],si64> -> !torch.int
    %17 = torch.aten.slice.Tensor %6, %int0, %int5, %int6, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64>
    %18 = torch.aten.item %17 : !torch.vtensor<[1],si64> -> !torch.int
    %19 = torch.aten.slice.Tensor %6, %int0, %int6, %int7, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64>
    %20 = torch.aten.item %19 : !torch.vtensor<[1],si64> -> !torch.int
    %21 = torch.aten.slice.Tensor %6, %int0, %int7, %int8, %int1 : !torch.vtensor<[8],si64>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1],si64>
    %22 = torch.aten.item %21 : !torch.vtensor<[1],si64> -> !torch.int
    %23 = torch.prim.ListConstruct %14, %22, %12, %20, %10, %18, %8, %16 : (!torch.int, !torch.int, !torch.int, !torch.int, !torch.int, !torch.int, !torch.int, !torch.int) -> !torch.list<int>
    %24 = torch.aten.constant_pad_nd %arg0, %23, %float1.500000e00 : !torch.vtensor<[?,?,?,?],f32>, !torch.list<int>, !torch.float -> !torch.vtensor<[?,?,?,?],f32>
    return %24 : !torch.vtensor<[?,?,?,?],f32>
  }
}
```

***All of these operations are useless***. It is literally the result of
needing to reverse (and change the lexicographic order hierarchy of)
padding ints provided via torch vs. ONNX pad ops, which is then
subsequently UNDONE by our ONNX->Torch lowering (represented in the
ordering of the generated list construct).

With the added folders in this patch, the torch IR becomes:

```
module {
  func.func @main_graph(%arg0: !torch.vtensor<[?,?,?,?],f32>) -> !torch.vtensor<[?,?,?,?],f32> attributes {torch.onnx_meta.ir_version = 9 : si64, torch.onnx_meta.opset_version = 20 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "2.5.0"} {
    %float1.500000e00 = torch.constant.float 1.500000e+00
    %int0 = torch.constant.int 0
    %int2 = torch.constant.int 2
    %int3 = torch.constant.int 3
    %int1 = torch.constant.int 1
    %0 = torch.prim.ListConstruct %int0, %int1, %int2, %int3, %int0, %int0, %int0, %int0 : (!torch.int, !torch.int, !torch.int, !torch.int, !torch.int, !torch.int, !torch.int, !torch.int) -> !torch.list<int>
    %1 = torch.aten.constant_pad_nd %arg0, %0, %float1.500000e00 : !torch.vtensor<[?,?,?,?],f32>, !torch.list<int>, !torch.float -> !torch.vtensor<[?,?,?,?],f32>
    return %1 : !torch.vtensor<[?,?,?,?],f32>
  }
}
```
Compiling with clang 16.0 on macOS I have warnings about incorrect
printf format (see below).

Values to be printed are `int64_t`, but they are printed with `%zu` and
`%ld`, which are not portable way to print this type.

```
<...>/torch-mlir/test/CAPI/torch.c:52:3: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
   52 |   DEFINE_CHECK(NonValueTensor)
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
<...>/torch-mlir/test/CAPI/torch.c:37:13: note: expanded from macro 'DEFINE_CHECK'
   36 |     fprintf(stderr, #TTT "Type %s rank: %zu\n", testName,                      \
      |                                         ~~~
   37 |             torchMlirTorch##TTT##TypeGetRank(TTT##Type));                      \
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<scratch space>:78:1: note: expanded from here
   78 | torchMlirTorchNonValueTensorTypeGetRank
      | ^
<...>/torch-mlir/test/CAPI/torch.c:52:3: warning: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Wformat]
   52 |   DEFINE_CHECK(NonValueTensor)
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
<...>/torch-mlir/test/CAPI/torch.c:42:15: note: expanded from macro 'DEFINE_CHECK'
   41 |       fprintf(stderr, #TTT "Type %s pos %d size: %ld\n", testName, i,          \
      |                                                  ~~~
   42 |               TTT##Sizes[i]);                                                  \
      |               ^~~~~~~~~~~~~
<scratch space>:85:1: note: expanded from here
   85 | NonValueTensorSizes
      | ^
<...>/torch-mlir/test/CAPI/torch.c:53:3: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
   53 |   DEFINE_CHECK(ValueTensor)
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~
<...>/torch-mlir/test/CAPI/torch.c:37:13: note: expanded from macro 'DEFINE_CHECK'
   36 |     fprintf(stderr, #TTT "Type %s rank: %zu\n", testName,                      \
      |                                         ~~~
   37 |             torchMlirTorch##TTT##TypeGetRank(TTT##Type));                      \
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<scratch space>:112:1: note: expanded from here
  112 | torchMlirTorchValueTensorTypeGetRank
      | ^
<...>/torch-mlir/test/CAPI/torch.c:53:3: warning: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Wformat]
   53 |   DEFINE_CHECK(ValueTensor)
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~
<...>/torch-mlir/test/CAPI/torch.c:42:15: note: expanded from macro 'DEFINE_CHECK'
   41 |       fprintf(stderr, #TTT "Type %s pos %d size: %ld\n", testName, i,          \
      |                                                  ~~~
   42 |               TTT##Sizes[i]);                                                  \
      |               ^~~~~~~~~~~~~
<scratch space>:119:1: note: expanded from here
  119 | ValueTensorSizes
      | ^
4 warnings generated.
```
… torch.aten.rrelu_with_noise_backward ops (fix) (llvm#3748)
…ensor_hacked_twin for TorchToTosa lowering. (llvm#3790)

1. Negative indices for tensor indexing is handled by wrapping around
the index values by checking their values at run time. Without the fix,
there was a runtime error.
2. Added a lit test to lock down the behavior.
3. Updated the `xfails_set` for `fx_importer_tosa` config to lockdown
the behavior with e2e test as well.

"THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY."
This commit sets the PyTorch and TorchVision version to nightly release
2024-10-29.

This commit also fixes the CI failure after this commit
llvm@54d9e24
got merged. The issue was that the CI checks in the PR were run before
the previous roll pytorch update but the PR was actually merged after
the roll pytorch update. Hence, the failure was not caught before
merging the PR.

While exporting the fx_graph through fx_importer for `rrelu` and
`rrelu_with_noise` op for train mode, it decomposes the
`aten.rrelu_with_noise` op based on the PyTorch decomposition which is
the default behavior. However, the decomposition contains an input
mutation specifically here
https://github.com/pytorch/pytorch/blob/9bbe4a67ad137032add6a3b0b74bda66f5ef83d2/torch/_decomp/decompositions.py#L325,
resulting in the runtime failure. This issue would probably be fixed by
pytorch/pytorch#138503. Until then, the failing
tests are added to the xfail set.

Also, after the roll pytorch update following tests started passing for
fx_importer, and fx_importer_stablehlo config.

- "ElementwiseRreluTrainModule_basic"
- "ElementwiseRreluTrainStaticModule_basic"
- "ElementwiseRreluWithNoiseTrainModule_basic"
- "ElementwiseRreluWithNoiseTrainStaticModule_basic"

This commit also updates the dtype check for the `aten.linear` op since
the op now expects both the input tensors to have the same dtype.

Signed-Off By: Vivek Khandelwal <[email protected]>
Removes a boolean variable that is used only for an assertion, and
inlines the condition into the assertion.

Signed-off-by: Max Dawkins <[email protected]>
- bumps llvm-project to
llvm/llvm-project@6c64c8a
- bumps stablehlo to
openxla/stablehlo@6e403b1
- Updates type conversion materialization functions to return Value
after API change in llvm-project.

---------

Signed-off-by: Max Dawkins <[email protected]>
@mgehre-amd mgehre-amd changed the title [AutoBump] Merge with fixes of aca33f17 (Oct 22) (92) [AutoBump] Merge with fixes of aca33f17 (Oct 22, needs LLVM Oct 19) (92) Jan 24, 2025
Base automatically changed from bump_to_42ba541c to bump_to_140cad56 January 28, 2025 07:17
Base automatically changed from bump_to_140cad56 to bump_to_d2330df5 January 28, 2025 07:17
Base automatically changed from bump_to_d2330df5 to feature/backport_ea1_ops January 28, 2025 07:42
@mgehre-amd mgehre-amd requested a review from jorickert January 29, 2025 13:02
@mgehre-amd mgehre-amd merged commit 8158fb6 into feature/backport_ea1_ops Jan 30, 2025
4 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_aca33f17 branch January 30, 2025 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.