[AutoBump] Merge with ea050ab1 (Oct 30) (3) #459

jorickert · 2025-01-31T14:23:29Z

No description provided.

…#113876) Currently, we dont have much tests that show SLP outcome for integer divisions. This patch adds tests for same. In certain scenarios, for Neon, vectorization is profitable. An attempt would be made in future to improve the cost-model for the same.

…vm#113920) Reverts llvm#86209 This patch breaks running tests locally, which is extremely disruptive to libc++ development.

…sections (llvm#113910) This patch disables the testcase for AIX and z/OS due to incomplete DWARF support.

…llvm#112747) Pure Scalable Types are defined in AAPCS64 here: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#pure-scalable-types-psts And should be passed according to Rule C.7 here: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#682parameter-passing-rules This part of the ABI is completely unimplemented in Clang, instead it treats PSTs sometimes as HFAs/HVAs, sometime as general composite types. This patch implements the rules for passing PSTs by employing the `CoerceAndExpand` method and extending it to: * allow array types in the `coerceToType`; Now only `[N x i8]` are considered padding. * allow mismatch between the elements of the `coerceToType` and the elements of the `unpaddedCoerceToType`; AArch64 uses this to map fixed-length vector types to SVE vector types. Corectly passing a PST argument needs a decision in Clang about whether to pass it in memory or registers or, equivalently, whether to use the `Indirect` or `Expand/CoerceAndExpand` method. It was considered relatively harder (or not practically possible) to make that decision in the AArch64 backend. Hence this patch implements the register counting from AAPCS64 (cf. `NSRN`, `NPRN`) to guide the Clang's decision.

…#112829) This is a non-functional change update GFX11/GFX12 VOPC/VOPCX asm/dasm test for true16/fake16: 1. duplicate files to be true16/fake16 by adding "-mattr=+real-true16/-mattr=-real-true16" while true16 test file will be updated to true16 format when the true16 instructions are supported 2. sort "*t16_err.s" and "*t16_promote.s" tests to alphabetic order. tests to alphabetic order. This is for the upcoming true16 mc changes, and mainly trying to help repo maintainer to resolve conflicts in the tests quickly. A script is proposed to help for the sorting llvm#111769. Since these two files are t16 only, it should not create conflicts in downstream branches 3. add `-filetype=null` to seperate stdout and stderr to avoid disordered output from llvm-mc

Reduces diff in llvm#112588

It was present on VMSBC but not VMADC. Reorder the instructions to avoid duplicate 'let' statements.

This patch adds the appropriate hookups in X86PfmCounters.td for SapphireRapids. This is mostly to fix errors when some of my jobs that only really need dummy counters get scheduled on sapphire rapids machines, but figured I might as well do it properly while here. I do not have hardware access to test this currently, but this matches exactly with what is in the libpfm source code.

…d StringMapEntry.getKey() (llvm#113735)

llvm#112338) …ne0Op builder Removing the declaration instead of implementing the builder as discussed in llvm#110106

llvm#113296) …structions This patch adds the following instructions: Conversion between floating-point and integer: FCVT{AS, AU, MS, MU, NS, NU, PS, PU, ZS, ZU} {S,U}CVTF Advanced SIMD three-register extension: FMMLA According to https://developer.arm.com/documentation/ddi0602 Co-authored-by: Marian Lukac [email protected] Co-authored-by: Spencer Abson [email protected]

) Split the scheduling classes of VMADC/VMSBC away from that of VADC/VSBC. Because the former are technically mask-producing instructions rather than normal vector arithmetics, which might have different performance characteristics on some processors. This is effectively NFC.

--macho -d uses the `parseInputMachO` code path, which does not handle -M. Add -M handling for --macho as well. Close llvm#61019 Pull Request: llvm#113795

…or compression (llvm#113606)

See https://discourse.llvm.org/t/rfc-deprecate-and-eventually-remove-renderscript-support/81284 for the RFC

…ing (llvm#112939) Previously lldb didn't support setting breakpoints on call site locations. This patch adds that ability. It would be very slow if we did this by searching all the debug information for every inlined subroutine record looking for a call-site match, so I added one restriction to the call-site support. This change will find all call sites for functions that also supply at least one line to the regular line table. That way we can use the fact that the line table search will move the location to that subsequent line (but only within the same function). When we find an actually moved source line match, we can search in the function that contained that line table entry for the call-site, and set the breakpoint location back to that. When I started writing tests for this new ability, it quickly became obvious that our support for virtual inline stepping was pretty buggy. We didn't print the right file & line number for the breakpoint, and we didn't set the position in the "virtual inlined stack" correctly when we hit the breakpoint. We also didn't step through the inlined frames correctly. There was code to try to detect the right inlined stack position, but it had been refactored a while back with the comment that it was super confusing and the refactor was supposed to make it clearer, but the refactor didn't work either. That code was made much clearer by abstracting the job of "handling the stack readjustment" to the various StopInfo's. Previously, there was a big (and buggy) switch over stop info's. Moving the responsibility to the stop info made this code much easier to reason about. We also had no tests for virtual inlined stepping (our inlined stepping test was actually written specifically to avoid the formation of a virtual inlined stack... So I also added tests for that along with the tests for setting the call-site breakpoints.

This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead. More details about the implementation and flags: - Added a flag `--pgo-instrument-cold-function-only` in `PGOInstrumentation.cpp` as the main switch to control skipping the instrumentation. - Built the extra instrumentation passes(a bundle of passes in `addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by `--instrument-cold-function-only-path` flag. - Added a driver flag `-fprofile-generate-cold-function-coverage`: - 1) Config the flags in one place, i,e. adding `--instrument-cold-function-only-path=<...>` and `--pgo-function-entry-coverage`. Note that the instrumentation file path is passed through `--instrument-sample-cold-function-path`, because we cannot use the `PGOOptions.ProfileFile` as it's already used by `-fprofile-sample-use=<...>`. - 2) makes linker to link `compiler_rt.profile` lib(see [ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131) ). - Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry count to determine cold function. Overall, the full command is like: ``` clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...> code.cc -o code ```

) Inlining currently assumes that either all function use controled convergence or none of them do. This is why we need to have the entry point wrapper use controled convergence. https://github.com/llvm/llvm-project/blob/c85611e8583e6392d56075ebdfa60893b6284813/llvm/lib/Transforms/Utils/InlineFunction.cpp#L2431-L2439

…n build machine (llvm#112780)

…#113939) This only changes `llvm/lib/Target/AMDGPU/SIISelLowering.cpp`. There are five uses of `std::tie` remaining because they can't be replaced with C++17 structured bindings.

Addresses llvm#112799

…#113389) With inferred modules, the dependency scanner takes care to replace the fake "__inferred_module.map" path with the file that allowed the module to be inferred. However, this only worked when such a module was imported directly in the TU. Whenever such module got loaded transitively, the scanner would fail to perform the replacement. This is caused by the fact that PCM files are lossy and drop this information. This patch makes sure that PCMs include this file for each submodule (in the `SUBMODULE_DEFINITION` record), fixes one existing test with an incorrect assertion, and does a little drive-by refactoring of `ModuleMap`.

NFC checks have been failing starting with https://lab.llvm.org/buildbot/#/builders/92/builds/8567. NFC testing wrapper (llvm-bolt-wrapper) replaces the call of `perf2bolt` with `llvm-bolt --aggregate-only --ignore-build-id`. `show-density` is automatically enabled for perf2bolt only but not for `llvm-bolt --aggregate-only`. Add the flag to the test to work around the issue. Test Plan: ``` cd build ../llvm-project/bolt/utils/nfc-check-setup.py --switch-back --verbose bin/llvm-lit -a tools/bolt/test/X86/pre-aggregated-perf.test ```

- Change FlattenedSpelling to use StringRef instead of std::String. - Use range for loops and enumerate(). - Use ArrayRef<> instead of std::vector reference as function arguments. - Use {} for all if/else branch bodies if one of them uses it.

…ex (llvm#113391) This patch avoids eagerly populating the submodule index on `Module` construction. The `StringMap` allocation shows up in my profiles of `clang-scan-deps`, while the index is not necessary most of the time. We still construct it on-demand. Moreover, this patch avoids performing qualified submodule lookup in `ASTReader` whenever we're serializing a module graph whose top-level module is unknown. This is pointless, since that's guaranteed to never find any existing submodules anyway. This speeds up `clang-scan-deps` by ~0.5% on my workload.

As with other nodes, we can convert these into G_UMULL and G_SMULL aarch64 instructions.

This patch converts `SDNodeFlags` into an enumeration as we did for `FastMathFlags`. It simplifies the implementation and improves compile-time. This patch is NFC since it doesn't break SDNodeFlags API.

Linalg op need to take into account memory side effects happening inside the region when determining their own side effects. This patch fixed issue llvm#112881

…r/reference when the guardian variable gets mutated. (llvm#113859) This checker has a notion of a guardian variable which is a variable and keeps the object pointed to by a raw pointer / reference in an inner scope alive long enough to "guard" it from use-after-free. But such a guardian variable fails to flawed to keep the object alive if it ever gets mutated within the scope of a raw pointer / reference. This PR fixes this bug by introducing a new AST visitor class, GuardianVisitor, which traverses the compound statements of a guarded variable (raw pointer / reference) and looks for any operator=, move constructor, or calls to "swap", "leakRef", or "releaseNonNull" functions.

…lvm#113760) Mark the whole StmtExpr invalid when the last statement in compound statement is invalid. Because the last statement need to do copy initialization, it causes subsequent errors to simply ignore last invalid statement. Fixed: llvm#113468

Incomplete types are not considered trivially copyable by clang but we don't want to warn about invalid argument for memcpy / memset in that case because we cannot prove they are not Trivially Copyable.

@echo

…14078) The trampoline script used on Windows (due to the absence of shebang support) doesn't properly expand the path to the Python script, as it leaves out the drive letter. Functionally equivalent reproducer in action ``` PS C:\Users\mate> gc (gcm git-clang-formatish.bat).Source @echo OFF echo "%~pn0" %* PS C:\Users\mate> git-clang-formatish "\Users\mate\git-clang-formatish" ``` Adding `d` to the variable modifiers [as per the docs](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/for) the drive letter is added. Even in the magical cases when it works. (I couldn't reproduce, but I suspect it's only tested from some bash/cygwin variant, where the path becomes `/c/Program Files/...`, but the drive letter is needed. Without it, I also observed cases when used via `git clang-format` (without the inital dash) it tries to infer the drive letter based on the current working directory. In that case it fails to find `D:\Program Files\LLVM\bin\clang-format.exe`, which naturally fails, because `Program Files` is on `C:`)

…ning and change the sample code to be a fully working example (llvm#113437) Tested the code: https://godbolt.org/z/n5xcq65YM Tested the generated documentation: ![BruDQ2UkTXHA9PE](https://github.com/user-attachments/assets/cf527d1a-ef3b-41f2-84c2-4ca38af16d2d)

@ogt

We added AVX10.2 COMEF ISA in LLVM, This does not optimize correctly in scenario mentioned below. Summary Input ``` define i1 @oeq(float %x, float %y) { %1 = fcmp oeq float %x, %y ret i1 %1 }define i1 @UNE(float %x, float %y) { %1 = fcmp une float %x, %y ret i1 %1 }define i1 @ogt(float %x, float %y) { %1 = fcmp ogt float %x, %y ret i1 %1 } // Prior AVX10.2, default code generation oeq: # @oeq cmpeqss xmm0, xmm1 movd eax, xmm0 and eax, 1 ret une: # @UNE cmpneqss xmm0, xmm1 movd eax, xmm0 and eax, 1 ret ogt: # @ogt ucomiss xmm0, xmm1 seta al ret ``` This patch will remove `cmpeqss` and `cmpneqss`. For complete transform check unit test. Continuing on what PR llvm#113098 added Earlier Legalization and combine expanded `setcc oeq:ch` node into `and` and `setcc eq` , `setcc o`. From suggestions in community new internal transform ``` Optimized type-legalized selection DAG: %bb.0 'hoeq:' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: f16,ch = CopyFromReg t0, Register:f16 %0 t4: f16,ch = CopyFromReg t0, Register:f16 %1 t14: i8 = setcc t2, t4, setoeq:ch t10: ch,glue = CopyToReg t0, Register:i8 $al, t14 t11: ch = X86ISD::RET_GLUE t10, TargetConstant:i32<0>, Register:i8 $al, t10:1 Optimized legalized selection DAG: %bb.0 'hoeq:' SelectionDAG has 12 nodes: t0: ch,glue = EntryToken t2: f16,ch = CopyFromReg t0, Register:f16 %0 t4: f16,ch = CopyFromReg t0, Register:f16 %1 t15: i32 = X86ISD::UCOMX t2, t4 t17: i8 = X86ISD::SETCC TargetConstant:i8<4>, t15 t10: ch,glue = CopyToReg t0, Register:i8 $al, t17 t11: ch = X86ISD::RET_GLUE t10, TargetConstant:i32<0>, Register:i8 $al, t10:1 ``` Earlier transform is mentioned here llvm#113098 (comment) --------- Co-authored-by: mattarde <[email protected]>

…#114193) Reproducer: ``` //--- a.cppm export module a; int func(); static int a = func(); //--- a.cpp import a; ``` The `func()` should only execute once. However, before this patch we will somehow import `static int a` from a.cppm incorrectly and initialize that again. This is super bad and can introduce serious runtime behaviors. And also surprisingly, it looks like the root cause of the problem is simply some oversight choosing APIs.

These are primarily meant to test disassembler and that no more than one variant per instruction is in DisassemblerTables as that can cause confusion when decoding v0 (vgpr0) whose value when encoded is 0.

Handling is similar to RecordType with following differences: 1. No check for cyclic references 2. No extra processing for lower bounds of array members. 3. No line information as TupleType is a lowering artefact and does not really represent an entity in the code.

NumBits should be less than 20 so using an unsigned instead of size_t should be OK

On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT Followup to llvm#112547

…m#104902)" (llvm#114023) This reverts commit ef44e46. The patch was originally reverted because it was deemed to introduce a performance regression for small inputs, however it also fixed a previous performance regression for larger inputs. So overall, this patch is desirable.

In Zig, we have a tool that updates our CPU model/feature data from LLVM's. Noticed these typos when running it for LLVM 19. Note: I don't have commit access.

…lvm#113893) Issue deprecation warning for these directives. Lowering currently supports parallel master, for all other combined or composite directives involving master, issue TODO errors. Note: The first commit changes the formatting and generalizes the deprecation message emission for reuse in the second commit. I can pull it out into a separate commit if required.

I've kept the old PR50392 tag since this is such an old issue....

…ess (NFC) (llvm#114016)

…n of others. (llvm#113580) Removes sve-bf16, sve-ebf16, and sve-i8mm since they are obsolete. One could write target_version("sve+bf16") instead of sve-bf16 for instance. Approved in ACLE as ARM-software/acle#353

… NFC. (llvm#112901)

…lvm#113681) This patch adds methods to `EntryBlockArgs` to access the full list of entry block argument-related symbols and variables, in their standard order. This helps centralizing this logic in as few places as possible to avoid future inconsistencies.

…lvm#114053) This patch adds assembly/disassembly support for the following SVE2.2 instructions - COMPACT (byte, halfword) - EXPAND - Allow selection of `COMPACT` (word/halfword) in streaming mode if the target has FEAT_SME2p2 (see [COMPACT ]( https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/COMPACT--Copy-active-vector-elements-to-lower-numbered-elements-)) - Rename predicates guarding instructions that are illegal in streaming SVE mode without FEAT_SME2p2 - In accordance with https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions Co-authored-by: Marian Lukac [email protected]

This patch adds assembly/disassembly and tests for new FIRSTP and LASTP instructions introduced in https://developer.arm.com/documentation/ddi0602/2024-09 --------- Co-authored-by: SpencerAbson <[email protected]>

…error message (llvm#114176) This commit changes the format of the materialization error message. Previously: `failed to legalize unresolved materialization from ('f64') to 'f32' that remained live after conversion` Now: `failed to legalize unresolved materialization from ('f64') to ('f32') that remained live after conversion` This commit is in preparation of merging the 1:1 and 1:N dialect conversions. At that point, target materializations may create more than one SSA value. I am sending this change as a separate PR to keep the main PR smaller.

lukel97 and others added 30 commits October 28, 2024 14:59

[RISCV] Add cost model tests for fp rounding ops for bf16. NFC

40363d5

Revert "[runtimes] Allow building against an installed LLVM tree" (ll…

3ac75ee

…vm#113920) Reverts llvm#86209 This patch breaks running tests locally, which is extremely disruptive to libc++ development.

[AIX][SystemZ][z/OS] Disable test for AIX, z/OS due to missing DWARF …

7d1e98c

…sections (llvm#113910) This patch disables the testcase for AIX and z/OS due to incomplete DWARF support.

[AArch64] Regenerate srem-lkk.ll to add missing asm comments

670512b

Reduces diff in llvm#112588

[RISCV] Add DestEEW = EEW1 to VMADC. (llvm#113013)

5ac3f3c

It was present on VMSBC but not VMADC. Reorder the instructions to avoid duplicate 'let' statements.

[clang] [NFC] Deduplicate the logic between StringMapEntry.first() an…

80f38fb

…d StringMapEntry.getKey() (llvm#113735)

[MLIR][Vector] Remove unused and unimplemented Vector_WarpExecuteOnLa… (

7a71011

llvm#112338) …ne0Op builder Removing the declaration instead of implementing the builder as discussed in llvm#110106

[llvm-objdump] Handle -M for --macho

92412c1

--macho -d uses the `parseInputMachO` code path, which does not handle -M. Add -M handling for --macho as well. Close llvm#61019 Pull Request: llvm#113795

Check hasOptSize() in shouldOptimizeForSize() (llvm#112626)

6ab26ea

[lld][InstrProf] Do not use cstring offset hashes in function order f…

6827a00

…or compression (llvm#113606)

Remove support for RenderScript (llvm#112916)

af7c58b

See https://discourse.llvm.org/t/rfc-deprecate-and-eventually-remove-renderscript-support/81284 for the RFC

[rtsan] Intercept aligned_alloc on all versions of OSX if available o…

97fb21a

…n build machine (llvm#112780)

[LLD][COFF] Add Support for ARM64EC pseudo relocations (llvm#113832)

31a6dbe

[NFC][AMDGPU] Use C++17 structured bindings as much as possible (llvm…

4cf1285

…#113939) This only changes `llvm/lib/Target/AMDGPU/SIISelLowering.cpp`. There are five uses of `std::tie` remaining because they can't be replaced with C++17 structured bindings.

Add DILabel functions for LLVM-C (llvm#112840)

f23bdbb

Addresses llvm#112799

[lldb] Fix lldb windows build breakage from llvm#113839.

19c0a74

[AArch64][GlobalISel] Lower aarch64.neon.smull/umull intrinsics.

5a5b78a

As with other nodes, we can convert these into G_UMULL and G_SMULL aarch64 instructions.

topperc and others added 30 commits October 29, 2024 22:34

[RISCV] Add OperandType for sew and vecpolicy operands. (llvm#114168)

e7262c1

[SDAG][NFC] Convert SDNodeFlags into an enumeration (llvm#114167)

f1467b3

This patch converts `SDNodeFlags` into an enumeration as we did for `FastMathFlags`. It simplifies the implementation and improves compile-time. This patch is NFC since it doesn't break SDNodeFlags API.

[mlir] [linalg] fix side effect of linalg op (llvm#114045)

df0d249

Linalg op need to take into account memory side effects happening inside the region when determining their own side effects. This patch fixed issue llvm#112881

[CodeGen][NewPM] Port TailDuplicate pass to NPM (llvm#113293)

44d0e95

[NFC] clean space in clang release note (llvm#114188)

5df84a7

[clang] Fix 7131569 in presence of incomplete types (llvm#114095)

dc56a86

Incomplete types are not considered trivially copyable by clang but we don't want to warn about invalid argument for memcpy / memset in that case because we cannot prove they are not Trivially Copyable.

[Attributor] Add nofpclass test for phi+select recurrences. NFC

f358422

[AMDGPU][MC][NFC] Add more VIMAGE encoding tests (llvm#114054)

e8b95a0

These are primarily meant to test disassembler and that no more than one variant per instruction is in DisassemblerTables as that can cause confusion when decoding v0 (vgpr0) whose value when encoded is 0.

Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC

0394888

NumBits should be less than 20 so using an unsigned instead of size_t should be OK

[DAG] Fold (and X, (rot (not Y), Z)) -> (and X, (not (rot Y, Z)))

f7b5f0c

On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT Followup to llvm#112547

[CSKY] Fix some typos in CPU feature descriptions (NFC) (llvm#105774)

f447cf1

In Zig, we have a tool that updates our CPU model/feature data from LLVM's. Noticed these typos when running it for LLVM 19. Note: I don't have commit access.

[PhaseOrdering][X86] Add additional test coverage for llvm#49736

2de1fc8

I've kept the old PR50392 tag since this is such an old issue....

[lld][ELF] Fix typo in help text for plugin-opt=opt-remarks-with-hotn…

fcfd643

…ess (NFC) (llvm#114016)

[PhaseOrdering][X86] Add test coverage for llvm#94546

bc999ee

[CodeGen] Change MachineInstr::isConstantValuePHI to return Register.…

cea9dd8

… NFC. (llvm#112901)

[AARCH64] Add assembly/disassmbly for FIRST,LASTP instr. (llvm#114049)

15f63ec

This patch adds assembly/disassembly and tests for new FIRSTP and LASTP instructions introduced in https://developer.arm.com/documentation/ddi0602/2024-09 --------- Co-authored-by: SpencerAbson <[email protected]>

[AutoBump] Merge with ea050ab (Oct 30)

1353852

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with ea050ab1 (Oct 30) (3) #459

[AutoBump] Merge with ea050ab1 (Oct 30) (3) #459

jorickert commented Jan 31, 2025

[AutoBump] Merge with ea050ab1 (Oct 30) (3) #459

Are you sure you want to change the base?

[AutoBump] Merge with ea050ab1 (Oct 30) (3) #459

Conversation

jorickert commented Jan 31, 2025