Extend `PartialStruct` to represent non-contiguously defined fields #57304

serenity4 · 2025-02-07T12:41:12Z

So far, PartialStruct has been unable to represent non-contiguously defined fields, where e.g. a struct would have fields 1 and 3 defined but not field 2. This PR extends it so that such information may be represented with PartialStruct, extending the applicability of optimizations e.g. introduced in #55297 by @aviatesk or #57222.

The semantics of new prevent the creation of a struct with non-contiguously defined fields, therefore this change is mostly relevant to model mutable structs whose fields may be previously set or assumed to be defined after creation, or immutable structs whose creation is opaque.

Notably, with this change we may now infer information about structs in the following case:

mutable struct A; x; y; z; A() = new(); end

function f()
    mut = A()
   
    # some opaque call preventing optimizations
    # who knows, maybe `identity` will set fields from `mut` in a future world age!
    invokelatest(identity, mut)
   
    isdefined(mut, :z) && isdefined(mut, :x) || return
   
    isdefined(mut, :x) & isdefined(mut, :z) # this now infers as `true`
    isdefined(mut, :y) # this does not
end

whereas previously, only information gained successively with isdefined(mut, :x) && isdefined(mut, :y) && isdefined(mut, :z) could allow inference to model mut having its z field defined.

topolarity

Instead of a separate BitVector{Bool}, would it be better to add a tombstone / marker for non-Const fields (e.g. Any)?

That seems easier to iterate and more memory-efficient for the common case of small structs.

serenity4 · 2025-02-07T15:51:34Z

The issue with choosing Any that is that we wouldn't be able to represent that a field is defined but its type is unknown. For example, in the case of ref::RefValue{Any} if we have fields = Any[Any] that means ref.x is defined and access will not throw.

We could have a sentinel value like nothing instead. I don't know if that changes much about memory efficiency or performance, but perhaps it would be simpler with not having to manage type field indices vs .fields indices. I guess using a BitVector makes it faster to lookup whether a field is defined or not, on the other hand (mostly relevant for isdefined tfuncs, I'd guess getfield requires to know the defined-ness + the type of a given field).

vtjnash · 2025-02-07T15:56:29Z

The sentinel for a value that definitely throws on access is Union{} and appears throughout inference for that purpose. If you are only tracking may-throw, then you need something like our VarInfo to add a boolean for it, but that complicates the type lattice

topolarity · 2025-02-07T16:02:33Z

The sentinel for a value that definitely throws on access

I think @serenity4 wants the opposite here - something that definitely doesn't throw on access. The pattern to copy is probably VarState, which has an undef::Bool to track maybe-undefined

That makes this two separate enhancements: (1) handling non-contiguous fields, and (2) tracking type / defined-ness independently - seems fair to me

serenity4 · 2025-02-07T16:02:48Z

We need a very basic "may-throw" information here, which in the current state is encoded as a lack of information. The idea was to avoid complicating the type lattice, which is why I went for a solution that does not require introducing a new sentinel value, and instead keeps track externally of available information.

But on the other hand, it makes little sense to iterate this .fields data structure without first knowing what specific field of the wrapped type it corresponds to, so we might as well put a sentinel value that is never meant to be manipulated as a lattice element.

serenity4 · 2025-02-07T16:04:37Z

Note that (2) is already implemented in current master, when a lattice element is present but does not add any more information than fieldtype(T, i).

topolarity · 2025-02-07T16:06:29Z

Is it possible to represent on current master a field that is Const(3) but may also be undefined?

serenity4 · 2025-02-07T16:08:46Z

No, Const(3) implies that the field is guaranteed to be defined and to have value 3. In fact, so long as you associate a lattice element with a field, it models the guarantee that the field is defined.

topolarity · 2025-02-07T16:16:56Z

Hmm, it feels strange that you can represent a defined && !Const field but not a !defined && Const one TBH

My vote would be to either go in the VarState direction and make definedness independent from the type information, or else keep PartialStruct's existing limitations for !defined && Const fields but use nothing as a tombstone (that way it's clear this doesn't work like VarState)

serenity4 · 2025-02-07T16:30:12Z

In what situation would you want to model !defined && Const? This information seems contradictory to me, if a field is undefined then it must not have any value set, by definition. The only Const in this case that seems to make sense would be something like NULL which is redundant with !defined (but I feel you are talking about something else).

In the current state (master and this PR), PartialStruct covers the following cases:

maybe-defined (no information, possibly no PartialStruct formed in the first place)
defined
defined-and-refined (e.g. Int refined as Const(3), Any refined as Int, etc)

The only thing it doesn't model is certainty that a field is undefined, but I don't find any situation where this would be particularly useful. (plus, as we can't unset fields, it would be hard to make this guarantee in the first place, unless it is immutable or all operations on the object are known since its creation)

topolarity · 2025-02-07T16:35:52Z

In what situation would you want to model !defined && Const?

You should read this as "maybe-undefined and Const when it is defined" because the defined bit represents "must-not-be-undefined"

That is what VarState can represent which PartialStruct currently can't - It's fine to preserve that limitation, but I do think a tombstone is clearer (and probably faster). The defined::Bool information made me think this would behave more like VarState, which it doesn't

serenity4 · 2025-02-07T17:06:13Z

Ah I see, indeed we have been conservative with PartialStruct so far but it might be useful to model things like "getfield may throw, but if it doesn't, it has this type/value".

In fact, it is already possible to do so using conditional branching information with Conditional, which can express that externally. Besides Conditional, perhaps a Union of PartialStructs may also be used in practice to achieve a similar effect, although both of these probably wouldn't scale when holding this two-state information for many fields.

My understanding is that since VarState is not part of the inference lattice, it cannot use the same mechanisms and therefore it is more important to encode this information internally.

I'd suggest to keep relying on the same mechanisms to express that, and perhaps use nothing (or another non-lattice element) as a sentinel value instead of having a bitvector. @aviatesk what would be your thoughts on this?

aviatesk · 2025-02-10T11:18:38Z

The fields in PartialStruct tell us two things:

If fields[i] exists, then the i-th field has definitely been assigned a value.
If fields[i] ⊏ fieldtype(obj, i), then we have refined type information about the i-th field.

So, right now, the fields array mixes two kinds of information about refined field types and field definedness information. And as long as a new refactored data structure can express two pieces of information separately, there doesn't seem to be a fundamental difference between the approach in this PR (adding a defined::BitVector field to PartialStruct) and the idea of storing special data like VarState (or some sentinel) in the fields array of PartialStruct? I think we can choose whichever is easier to implement or more efficient.

Though since field defined-ness is really similar variable defined-ness information represented by VarState, I think it might be better to follow the VarState pattern and use (maybe)undef::BitVector instead of defined::BitVector.

Regardless of which approach we take, we'll need to carefully audit all the places where PartialStruct's fields are accessed and make the necessary updates. But adding undef::BitVector seems less likely to break existing code, so how about we move forward with that approach and then measure the performance later, and think about the further refactoring if there is any performance issue detected?

topolarity · 2025-02-10T13:35:40Z

My understanding is that since VarState is not part of the inference lattice, it cannot use the same mechanisms and therefore it is more important to encode this information internally

VarState's .typ field are lattice elements, so I think it really can - but it applies in other circumstances since Conditional requires that you have the local branch condition available to refer to (e.g. in a loop or after an assignment, it may be lost) and does not comment on defined-ness for top-level variables (e.g. a Conditional(b, Const(false), Union{}) can be undefined even if b is true, although sometimes we make up for that by carrying a second Conditional to encode the defined-ness information). I'm pretty sure a Union of PartialStruct can't be made equivalent either, although I might be wrong about that

And as long as a new refactored data structure can express two pieces of information separately, there doesn't seem to be a fundamental difference

I think that's the main point of discussion - the current data structure can't represent those two pieces of information separately, since defined::Bool just indicates a "skipped" field not in the array (it is equivalent to a "missing" entry in the array or a tombstone). If we want to make this more like VarState, then the "skipped field" representation doesn't work because you can't represent maybe-undef + refined type

That said, I'm a fan of @serenity4's plan above to leave that enhancement for later - it seems likely to impact a lot more code vs. the smaller changeset here

aviatesk · 2025-02-10T14:30:01Z

I think that's the main point of discussion - the current data structure can't represent those two pieces of information separately, since defined::Bool just indicates a "skipped" field not in the array (it is equivalent to a "missing" entry in the array or a tombstone). If we want to make this more like VarState, then the "skipped field" representation doesn't work because you can't represent maybe-undef + refined type

I'm a bit confused about this. Are you saying we can't represent a Base.RefValue{Any} with a "maybe undefined" Const(42) field like this?

struct PartialStruct
    typ
    fields::Vector{Any}
    maybeundef::BitVector
end
x = PartialStruct(Base.RefValue{Any}, Any[Const(42)], trues(1))

We'd need to handle the "maybe undef" information correctly everywhere we access x.fields. But I think that's also true if we represent it as e.g. PartialStruct(Base.RefValue{Any}, Any[MaybeUndef(Const(42))]).

serenity4 · 2025-02-10T14:36:22Z

We would essentialy go with a vectorized version of VarState then:

undef::BitVector that indicates whether a field may be undefined (true) or must be defined (false), essentially expressing exactly what the new defined field introduced (undef == .!defined) but in a way that is more consistent with other data structures (such as VarState).
fields::Vector{Any} that has always size fieldcount(typ), as opposed to count(.!undef)/count(defined), with lattice elements that are always equivalent to or more specific than fieldtype(typ, i). (special care will be required for types like Tuple where the number of fields is unknown)

The main difference with the current state of this PR (besides undef != !defined) is that then fields always have an entry for a given field index; what @topolarity referred to is that right now if the field may be undefined (!defined[i])) we just don't have an entry in this vector (so length(fields) <= fieldcount(typ)).

In this way, we move out the information that the i-th field is defined to the bitvector, instead of encoding it as length(fields) >= i.

serenity4 · 2025-02-10T14:49:05Z

If we are worried about memory efficiency/performance (notably for large structs), we can always keep the optimization that if there exists i such that undef[j] === true && fields[j] == fieldtype(typ, j) for all j >= i (i.e. no new information provided by this PartialStruct for those fields), we may resize fields (and possibly undef) to have length i - 1 (to be updated as soon as new information is available for field >= i). Unless we notice performance degradations we can keep that for later though.

That seems to be the cost of being able to express both maybe-undef and refined type information for a given field (which the current PR avoids by only keeping type for fields that are known to be defined, or master by giving up on storing refined field type information starting from the first possibly undefined field).

vtjnash · 2025-02-11T02:15:40Z

The fields in PartialStruct tell us two things:

If fields[i] exists, then the i-th field has definitely been assigned a value.

If fields[i] ⊏ fieldtype(obj, i), then we have refined type information about the i-th field.
So, right now, the fields array mixes two kinds of information about refined field types and field definedness information.

This description sounds concerning, as it seems to mean we cannot correctly represent the tmerge of two of these such types

aviatesk · 2025-02-11T07:59:29Z

We can't simply tmerge the fields of PartialStruct even right now.
When dealing with something like PartialStruct(Base.RefValue{Any}, Any[]) ⊔ PartialStruct(Base.RefValue{Any}, Any[Const(42)]) = PartialStruct(Base.RefValue{Any}, Any[]), the tmerge_partial_struct function needs to behave differently depending on the length(fields) of each PartialStruct.
Since we need to handle these maybe-undefined-ness anyway, I think it would be better to encode that information explicitly with undef::BitVector instead of relying on the current length(fields) to represent them?

…alstruct

vtjnash · 2025-02-11T15:52:55Z

the tmerge_partial_struct function needs to behave differently depending on the length(fields) of each PartialStruct.

Oh, I see, that is slightly confusing, since it goes the opposite direction of the lattice normally is implemented, and so it discards information slightly faster than expected. This change to the bitvector approach (with an optimization that it only gets allocated if any are maybe-undefined) seems slightly clearer semantically.

…alstruct

serenity4

Tests seem to be passing (besides one OOM in threading tests, not sure if that's related), should we run benchmarks to see if the current design has any performance regressions? It seemed locally that compile times had increased a fair bit, but I don't seem to observe that in CI. I would nonetheless recommend to check.

Compiler/src/typelimits.jl

base/coreir.jl

base/essentials.jl

base/coreir.jl

Compiler/src/typelattice.jl

…alstruct

aviatesk · 2025-02-17T18:28:54Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2025-02-17T19:40:23Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

…alstruct

Compiler/src/abstractinterpretation.jl

base/coreir.jl

Compiler/src/typelattice.jl

base/coreir.jl

Compiler/src/typelattice.jl

Co-authored-by: Shuhei Kadowaki <[email protected]>

…alstruct

serenity4 · 2025-02-20T23:35:44Z

@nanosoldier runbenchmarks("inference", vs=":master")

Cédric Belmant added 6 commits February 7, 2025 07:19

Extend PartialStruct with an additional defined field

b25a8a7

Add constprop test

67b12f8

Satisfy typo checker

58827f0

Remove trailing whitespace

1274f23

Fix binding access

ede6dc7

Retrigger tests

f361ab2

topolarity reviewed Feb 7, 2025

View reviewed changes

Cédric Belmant added 2 commits February 11, 2025 05:46

Merge branch 'master' of github.com:JuliaLang/julia into extend-parti…

201c956

…alstruct

defined -> undef

67b9168

serenity4 force-pushed the extend-partialstruct branch from 8a9f31d to 67b9168 Compare February 11, 2025 13:31

Record undef & field information for all PartialStruct fields

3857baf

Cédric Belmant added 4 commits February 14, 2025 05:20

Merge branch 'master' of github.com:JuliaLang/julia into extend-parti…

00b1135

…alstruct

Fix tests

c9ee9b0

Remove unnecessary function

ced6296

Merge branch 'master' of github.com:JuliaLang/julia into extend-parti…

c6e7099

…alstruct

serenity4 commented Feb 14, 2025

View reviewed changes

Compiler/src/typelimits.jl Outdated Show resolved Hide resolved

base/coreir.jl Outdated Show resolved Hide resolved

base/essentials.jl Show resolved Hide resolved

base/coreir.jl Outdated Show resolved Hide resolved

serenity4 marked this pull request as ready for review February 14, 2025 16:36

topolarity reviewed Feb 14, 2025

View reviewed changes

Compiler/src/typelattice.jl Outdated Show resolved Hide resolved

Compiler/src/typelattice.jl Outdated Show resolved Hide resolved

Compiler/src/typelattice.jl Outdated Show resolved Hide resolved

Cédric Belmant added 2 commits February 17, 2025 06:48

Fix unsoundness for ⊑, don't widen undef information for PartialStruct

b44dfb5

Merge branch 'master' of github.com:JuliaLang/julia into extend-parti…

a272011

…alstruct

Merge branch 'master' of github.com:JuliaLang/julia into extend-parti…

02746fa

…alstruct

aviatesk reviewed Feb 18, 2025

View reviewed changes

Compiler/src/typelattice.jl Outdated Show resolved Hide resolved

base/coreir.jl Outdated Show resolved Hide resolved

aviatesk reviewed Feb 18, 2025

View reviewed changes

Compiler/src/typelattice.jl Outdated Show resolved Hide resolved

Compiler/src/typelattice.jl Outdated Show resolved Hide resolved

serenity4 and others added 7 commits February 18, 2025 14:31

Apply suggestions from code review

b1efbe5

Co-authored-by: Shuhei Kadowaki <[email protected]>

Merge branch 'master' of github.com:JuliaLang/julia into extend-parti…

741b52f

…alstruct

Allow sparse field type information, remove mutations in constructor

7309506

Merge branch 'master' of github.com:JuliaLang/julia into extend-parti…

5947f49

…alstruct

Remove unused utility

d771b94

Refactor define_field, don't overwrite defined field type

b20cf69

Minor polish

32d3a89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend `PartialStruct` to represent non-contiguously defined fields #57304

Extend `PartialStruct` to represent non-contiguously defined fields #57304

serenity4 commented Feb 7, 2025

topolarity left a comment

serenity4 commented Feb 7, 2025 •

edited

Loading

vtjnash commented Feb 7, 2025

topolarity commented Feb 7, 2025

serenity4 commented Feb 7, 2025

serenity4 commented Feb 7, 2025 •

edited

Loading

topolarity commented Feb 7, 2025

serenity4 commented Feb 7, 2025

topolarity commented Feb 7, 2025 •

edited

Loading

serenity4 commented Feb 7, 2025

topolarity commented Feb 7, 2025

serenity4 commented Feb 7, 2025

aviatesk commented Feb 10, 2025

topolarity commented Feb 10, 2025

aviatesk commented Feb 10, 2025

serenity4 commented Feb 10, 2025

serenity4 commented Feb 10, 2025 •

edited

Loading

vtjnash commented Feb 11, 2025 •

edited by aviatesk

Loading

aviatesk commented Feb 11, 2025

vtjnash commented Feb 11, 2025

serenity4 left a comment •

edited

Loading

aviatesk commented Feb 17, 2025

nanosoldier commented Feb 17, 2025

serenity4 commented Feb 20, 2025

Extend PartialStruct to represent non-contiguously defined fields #57304

Are you sure you want to change the base?

Extend PartialStruct to represent non-contiguously defined fields #57304

Conversation

serenity4 commented Feb 7, 2025

topolarity left a comment

Choose a reason for hiding this comment

serenity4 commented Feb 7, 2025 • edited Loading

vtjnash commented Feb 7, 2025

topolarity commented Feb 7, 2025

serenity4 commented Feb 7, 2025

serenity4 commented Feb 7, 2025 • edited Loading

topolarity commented Feb 7, 2025

serenity4 commented Feb 7, 2025

topolarity commented Feb 7, 2025 • edited Loading

serenity4 commented Feb 7, 2025

topolarity commented Feb 7, 2025

serenity4 commented Feb 7, 2025

aviatesk commented Feb 10, 2025

topolarity commented Feb 10, 2025

aviatesk commented Feb 10, 2025

serenity4 commented Feb 10, 2025

serenity4 commented Feb 10, 2025 • edited Loading

vtjnash commented Feb 11, 2025 • edited by aviatesk Loading

aviatesk commented Feb 11, 2025

vtjnash commented Feb 11, 2025

serenity4 left a comment • edited Loading

Choose a reason for hiding this comment

aviatesk commented Feb 17, 2025

nanosoldier commented Feb 17, 2025

serenity4 commented Feb 20, 2025

Extend `PartialStruct` to represent non-contiguously defined fields #57304

Extend `PartialStruct` to represent non-contiguously defined fields #57304

serenity4 commented Feb 7, 2025 •

edited

Loading

serenity4 commented Feb 7, 2025 •

edited

Loading

topolarity commented Feb 7, 2025 •

edited

Loading

serenity4 commented Feb 10, 2025 •

edited

Loading

vtjnash commented Feb 11, 2025 •

edited by aviatesk

Loading

serenity4 left a comment •

edited

Loading