Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 2e43a304 (Oct 25) (1) #457

Merged
merged 589 commits into from
Feb 3, 2025

Conversation

jorickert
Copy link

No description provided.

boomanaiden154 and others added 30 commits October 28, 2024 09:07
This patch adds the appropriate hookups in X86PfmCounters.td for
SapphireRapids. This is mostly to fix errors when some of my jobs that
only really need dummy counters get scheduled on sapphire rapids
machines, but figured I might as well do it properly while here. I do
not have hardware access to test this currently, but this matches
exactly with what is in the libpfm source code.
llvm#112338)

…ne0Op builder

Removing the declaration instead of implementing the builder as
discussed in llvm#110106
llvm#113296)

…structions

This patch adds the following instructions:
Conversion between floating-point and integer:
  FCVT{AS, AU, MS, MU, NS, NU, PS, PU, ZS, ZU}
  {S,U}CVTF
Advanced SIMD three-register extension:
  FMMLA

According to https://developer.arm.com/documentation/ddi0602

Co-authored-by: Marian Lukac [email protected]
Co-authored-by: Spencer Abson [email protected]
)

Split the scheduling classes of VMADC/VMSBC away from that of VADC/VSBC.
Because the former are technically mask-producing instructions rather
than normal vector arithmetics, which might have different performance
characteristics on some processors.

This is effectively NFC.
--macho -d uses the `parseInputMachO` code path, which does not handle
-M. Add -M handling for --macho as well.

Close llvm#61019

Pull Request: llvm#113795
…ing (llvm#112939)

Previously lldb didn't support setting breakpoints on call site
locations. This patch adds that ability.

It would be very slow if we did this by searching all the debug
information for every inlined subroutine record looking for a call-site
match, so I added one restriction to the call-site support. This change
will find all call sites for functions that also supply at least one
line to the regular line table. That way we can use the fact that the
line table search will move the location to that subsequent line (but
only within the same function). When we find an actually moved source
line match, we can search in the function that contained that line table
entry for the call-site, and set the breakpoint location back to that.

When I started writing tests for this new ability, it quickly became
obvious that our support for virtual inline stepping was pretty buggy.
We didn't print the right file & line number for the breakpoint, and we
didn't set the position in the "virtual inlined stack" correctly when we
hit the breakpoint. We also didn't step through the inlined frames
correctly. There was code to try to detect the right inlined stack
position, but it had been refactored a while back with the comment that
it was super confusing and the refactor was supposed to make it clearer,
but the refactor didn't work either.

That code was made much clearer by abstracting the job of "handling the
stack readjustment" to the various StopInfo's. Previously, there was a
big (and buggy) switch over stop info's. Moving the responsibility to
the stop info made this code much easier to reason about.

We also had no tests for virtual inlined stepping (our inlined stepping
test was actually written specifically to avoid the formation of a
virtual inlined stack... So I also added tests for that along with the
tests for setting the call-site breakpoints.
This patch adds support for cold function coverage instrumentation based
on sampling PGO counts. The major motivation is to detect dead functions
for the services that are optimized with sampling PGO. If a function is
covered by sampling profile count (e.g., those with an entry count > 0),
we choose to skip instrumenting those functions, which significantly
reduces the instrumentation overhead.

More details about the implementation and flags:
- Added a flag `--pgo-instrument-cold-function-only` in
`PGOInstrumentation.cpp` as the main switch to control skipping the
instrumentation.
- Built the extra instrumentation passes(a bundle of passes in
`addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by
`--instrument-cold-function-only-path` flag.
- Added a driver flag `-fprofile-generate-cold-function-coverage`: 
- 1) Config the flags in one place, i,e. adding
`--instrument-cold-function-only-path=<...>` and
`--pgo-function-entry-coverage`. Note that the instrumentation file path
is passed through `--instrument-sample-cold-function-path`, because we
cannot use the `PGOOptions.ProfileFile` as it's already used by
`-fprofile-sample-use=<...>`.
- 2) makes linker to link `compiler_rt.profile` lib(see
[ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131)
).
- Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry
count to determine cold function.

Overall, the full command is like:

```
clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...>  code.cc -o code
```
)

Inlining currently assumes that either all function use controled
convergence or none of them do. This is why we need to have the entry
point wrapper use controled convergence.


https://github.com/llvm/llvm-project/blob/c85611e8583e6392d56075ebdfa60893b6284813/llvm/lib/Transforms/Utils/InlineFunction.cpp#L2431-L2439
…#113939)

This only changes `llvm/lib/Target/AMDGPU/SIISelLowering.cpp`.
There are five uses of `std::tie` remaining because they can't be
replaced with
C++17 structured bindings.
…#113389)

With inferred modules, the dependency scanner takes care to replace the
fake "__inferred_module.map" path with the file that allowed the module
to be inferred. However, this only worked when such a module was
imported directly in the TU. Whenever such module got loaded
transitively, the scanner would fail to perform the replacement. This is
caused by the fact that PCM files are lossy and drop this information.

This patch makes sure that PCMs include this file for each submodule (in
the `SUBMODULE_DEFINITION` record), fixes one existing test with an
incorrect assertion, and does a little drive-by refactoring of
`ModuleMap`.
NFC checks have been failing starting with
https://lab.llvm.org/buildbot/#/builders/92/builds/8567.

NFC testing wrapper (llvm-bolt-wrapper) replaces the call of `perf2bolt`
with `llvm-bolt --aggregate-only --ignore-build-id`.

`show-density` is automatically enabled for perf2bolt only but not for
`llvm-bolt --aggregate-only`. Add the flag to the test to work around
the issue.

Test Plan:
```
cd build
../llvm-project/bolt/utils/nfc-check-setup.py --switch-back --verbose
bin/llvm-lit -a tools/bolt/test/X86/pre-aggregated-perf.test
```
- Change FlattenedSpelling to use StringRef instead of std::String.
- Use range for loops and enumerate().
- Use ArrayRef<> instead of std::vector reference as function arguments.
- Use {} for all if/else branch bodies if one of them uses it.
…ex (llvm#113391)

This patch avoids eagerly populating the submodule index on `Module`
construction. The `StringMap` allocation shows up in my profiles of
`clang-scan-deps`, while the index is not necessary most of the time. We
still construct it on-demand.

Moreover, this patch avoids performing qualified submodule lookup in
`ASTReader` whenever we're serializing a module graph whose top-level
module is unknown. This is pointless, since that's guaranteed to never
find any existing submodules anyway.

This speeds up `clang-scan-deps` by ~0.5% on my workload.
As with other nodes, we can convert these into G_UMULL and G_SMULL aarch64
instructions.
llvm#113947)

…ne stepping (llvm#112939)"

This was breaking some gdb-remote packet counting tests on the bots. I
can't see how this patch could cause that breakage, but I'm reverting to
figure that out.

This reverts commit f147437.
This patch aims to reduce the include used by AArch64ISelLowering, allowing it
to be included by unittests so that they can reference the AArch64ISD nodes.
It:
 - Moves the inclusion of AArch64SMEAttributes.h to the uses.
 - Moves LowerPtrAuthGlobalAddressStatically to a static function, so that
   AArch64PACKey is not required in the header.
 - Moves the definitions of getExceptionPointerRegister to the cpp file, to
   remove the reference of AArch64::X0.
…ecked-optional-access (llvm#113922)

Previously, we covered returning refs, or copies of optional, and bools.
Now cover returning pointers (to any type).
This is useful for cases like operator-> of smart pointers.
Addresses more of issue llvm#58510
Use VPInstruction::ResumePhi to create phi nodes for reduction resume
values in the scalar preheader, similar to how ResumePhis are used for
first-order recurrence resume values after 9a5a873.

This allows simplifying createAndCollectMergePhiForReduction to only
collect reduction resume phis when vectorizing epilogue loops and adding
extra incoming edges from the main vector loop. Updating phis for the
epilogue vector loops requires special attention, because additional
incoming values from the bypass blocks need to be added.

PR: llvm#110004
Instead of changing the return type of `ModuleMap::findOrCreateModule`, this patch adds a counterpart that only returns `Module *` and thus has the same signature as `createModule()`, which is important in `ASTReader`.
Motivation example:
```
> cat test.cpp
extern "C" [[gnu::weak]] void f() {}
void alias() __attribute__((alias("f")));
int main() { auto p = alias; p(); }
> clang test.cpp -fsanitize=cfi-icall -flto=thin -fuse-ld=lld
> ./a.out
[1]    1868 illegal hardware instruction  ./a.out
```

If the address of a function was only taken through its alias, the
function was not considered exported and therefore was not included
in the CFI jumptable. This resulted in `@llvm.type.test()` being lowered
to `false`, and consequently the indirect call to the function was
eventually optimized to `ubsantrap()`.
f cannot be used as an output constraint. We already errored for '=f'
but not '+f'.

Fixes llvm#113692.
mbrkusanin and others added 22 commits October 30, 2024 10:45
These are primarily meant to test disassembler and that no more than
one variant per instruction is in DisassemblerTables as that can cause
confusion when decoding v0 (vgpr0) whose value when encoded is 0.
Handling is similar to RecordType with following differences:

1. No check for cyclic references
2. No extra processing for lower bounds of array members.
3. No line information as TupleType is a lowering artefact and does not
really represent an entity in the code.
NumBits should be less than 20 so using an unsigned instead of size_t should be OK
On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT

Followup to llvm#112547
…m#104902)" (llvm#114023)

This reverts commit ef44e46. The patch was originally reverted
because it was
deemed to introduce a performance regression for small inputs, however
it also fixed
a previous performance regression for larger inputs. So overall, this
patch is desirable.
In Zig, we have a tool that updates our CPU model/feature data from
LLVM's. Noticed these typos when running it for LLVM 19.

Note: I don't have commit access.
…lvm#113893)

Issue deprecation warning for these directives.
Lowering currently supports parallel master, for all other combined or
composite directives involving master, issue TODO errors.

Note: The first commit changes the formatting and generalizes the
deprecation message emission for reuse in the second commit. I can pull
it out into a separate commit if required.
I've kept the old PR50392 tag since this is such an old issue....
…n of others. (llvm#113580)

Removes sve-bf16, sve-ebf16, and sve-i8mm since they are obsolete. One
could write target_version("sve+bf16") instead of sve-bf16 for instance.

Approved in ACLE as ARM-software/acle#353
…lvm#113681)

This patch adds methods to `EntryBlockArgs` to access the full list of
entry block argument-related symbols and variables, in their standard
order. This helps centralizing this logic in as few places as possible
to avoid future inconsistencies.
…lvm#114053)

This patch adds assembly/disassembly support for the following SVE2.2
instructions

      - COMPACT (byte, halfword)
      - EXPAND

- Allow selection of `COMPACT` (word/halfword) in streaming mode if the
target has FEAT_SME2p2 (see [COMPACT ](
https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/COMPACT--Copy-active-vector-elements-to-lower-numbered-elements-))
- Rename predicates guarding instructions that are illegal in streaming
SVE mode without FEAT_SME2p2
- In accordance with
https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions
Co-authored-by: Marian Lukac [email protected]
This patch adds assembly/disassembly and tests for new FIRSTP
and LASTP instructions introduced in
https://developer.arm.com/documentation/ddi0602/2024-09

---------

Co-authored-by: SpencerAbson <[email protected]>
…error message (llvm#114176)

This commit changes the format of the materialization error message.

Previously: `failed to legalize unresolved materialization from ('f64')
to 'f32' that remained live after conversion`
Now: `failed to legalize unresolved materialization from ('f64') to
('f32') that remained live after conversion`

This commit is in preparation of merging the 1:1 and 1:N dialect
conversions. At that point, target materializations may create more than
one SSA value. I am sending this change as a separate PR to keep the
main PR smaller.
…vm#113999)

This commit adds support for bufferizing external functions that have no
body. Such functions were previously rejected by One-Shot Bufferize if
they returned a tensor value.

This commit is in preparation of removing the deprecated
`func-bufferize` pass. That pass can bufferize external functions.

Also update a few comments.
…sults option"

This reverts commit 151f763, except for the test added by it
@jorickert jorickert requested a review from mgehre-amd February 3, 2025 07:30
[AutoBump] Merge with fixes of b91ce7b (Oct 25) (2)
[AutoBump] Merge with ea050ab (Oct 30) (3)
[AutoBump] Merge with fixes of 217700b (Oct 30) (4) (Needs downstream changes)
@jorickert jorickert enabled auto-merge February 3, 2025 08:58
@jorickert jorickert merged commit e8be3be into feature/fused-ops Feb 3, 2025
4 of 5 checks passed
@jorickert jorickert deleted the bump_to_2e43a304 branch February 3, 2025 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.