Caching practices for incremental generators #67469

majellin24 · 2023-03-23T21:50:49Z

majellin24
Mar 23, 2023

I've just started looking into incremental generators and so far they seem awesome!

To my limited knowledge it seems like a fundamental principal of writing an incremental generator is "Authoring a cache friendly generator." Otherwise, we sort of lose the "Incremental" part.

However, I am left puzzled by the best way to accomplish this, given that often (from my perspective) the most useful information comes from a combination of the Syntax Tree and Semantic Model.

Generally, examples I have seen make use of the SyntaxValueProvider to filter down to specific nodes. As the docs explain, the predicate is not re-run for nodes which previously passed, but the transform is re-run on every edit. Therefore, it seems like the transform should ideally just do some more light filtering or perhaps build up a new model based mostly on the syntax node, to avoid re-doing heavier tasks repeatedly. After that, it seems to me that ideally, the results of CreateSyntaxProvider should get piped into a select where the heavier work is done, so that the heavier work can be cached and only re-done when the transform of CreateSyntaxProvider produces new results.

However, in my understanding, this heavier work is done by working with the Semantic Model, but we cannot access the Semantic Model without either the context which gets passed into the transform of CreateSyntaxProvider, or the Compilation provided by the CompilationProvider. I see examples using "Combine" on the result of CreateSyntaxProvider and the CompilationProvider, but this causes any work done in later steps to get re-done on every edit since (in my understanding) the CompilationProvider output changes on every edit. Therefore, we dont really get to cache this heavier work.

It would seem another option would be to return (from CreateSyntaxProvider transform) a custom type which holds a reference to the semantic model, or which holds a reference to a few relevant symbols, then use that in a future select, so that the future select is cached. However, the docs explicitly call out not to do that:

"Extract out information early: It is best to get the information out of the inputs as early as possible in the pipeline. This ensures the host is not caching large, expensive object such as symbols."

So basically my question comes down to, what caching strategy can I use so that my more expensive processing steps are not re-run so often? Assuming here that I want to use the Semantic Model in those steps.

Additionally, one generator I'm writing relies on the Compilation, but it only cares if the SourceModule.ReferencedAssemblySymbols changes. Ideally I would only want it to do any work at all if ReferencedAssemblySymbols has changed, since it looks for implementations of a specific interface within them. Since it seems the CompilationProvider output changes on every edit, the generator does its search of the referenced assemblies many times even if none have been added or removed. This seems wasteful to me and it seems like one of the situations these generators were re-worked to avoid. However, I cannot seem to think of any way around this, other than using ReferencedAssemblySymbols themselves (with a custom comparer) as outputs. Yet this would be "caching large, expensive object such as symbols" which the docs warn against.

Is it somehow less expensive to keep a reference (in the cache) to an instance of the Compilation or and instance of a Semantic Model, than it is to keep a reference to a symbol? I can, for instance, see returning a struct with the names of the ReferencedAssemblySymbols I care about and a reference to the current Compilation, but which only uses the names as part of its IEquatable implementation. This way, the next step would only run if those names changes and I would be able to use the current Compilation in it. But if caching the Compilation is just as expensive as caching a symbol then this is obviously not a solution.

I feel as though I may be missing a major concept here so I'm sorry if the answer is obvious. Thanks

CyrusNajmabadi · 2023-03-23T22:07:24Z

CyrusNajmabadi
Mar 23, 2023
Collaborator

I'd bring this to discord fwiw. https://discord.com/channels/732297728826277939/735233259763400715 would be appropriate.

5 replies

majellin24 Mar 23, 2023
Author

Thanks! However, that link doesnt seem to take me anywhere :)

CyrusNajmabadi Mar 23, 2023
Collaborator

Weird. Try https://aka.ms/dotnet-discord!

majellin24 Mar 23, 2023
Author

Great, thank you

bekir-ozturk Dec 16, 2023

Having had a similar question, I went through the discussion in the Discord channel. I'm writing my understanding of the situation below so that other's wouldn't have to go to the channel for this.

Disclaimer: I'm yet another confused source generator developer. My understanding may be wrong. Please take it with a grain of salt.

The main takeaway is: don't cache any symbols (i.e. references to objects implementing INamedTypeSymbol, IMethodSymbol etc.) or other larger objects such as the semantic model or the compilation. They change pretty much everytime anything changes in the project. When they change, intermediate results generated by your pipeline will look completely different to the compiler and the compiler will have no coice but to run the whole pipeline to the end every time. To avoid this, you should extract the data you need into a simple data structure consisting of strings, ints, booleans etc. at the beginning of your pipeline. If you can do this as a result of the first transformation, for instance, then the rest of your pipeline can be skipped if the result of the transformation is the same as it was in the previous compilation.

If you feel the need to put a symbol into your intermediate data structure first think about what exactly you need from it. If the answer is "I'll use that as a return type to my generated function", then you only need the type's name and can simply store it as a string. If you need to check whether that type implements IDisposable, then evaluate the result early on and store it as a boolean.

CyrusNajmabadi Dec 16, 2023
Collaborator

@bekir-ozturk that's all correct. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching practices for incremental generators #67469

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Caching practices for incremental generators #67469

majellin24 Mar 23, 2023

Replies: 1 comment · 5 replies

CyrusNajmabadi Mar 23, 2023 Collaborator

majellin24 Mar 23, 2023 Author

CyrusNajmabadi Mar 23, 2023 Collaborator

majellin24 Mar 23, 2023 Author

bekir-ozturk Dec 16, 2023

CyrusNajmabadi Dec 16, 2023 Collaborator

majellin24
Mar 23, 2023

Replies: 1 comment 5 replies

CyrusNajmabadi
Mar 23, 2023
Collaborator

majellin24 Mar 23, 2023
Author

CyrusNajmabadi Mar 23, 2023
Collaborator

majellin24 Mar 23, 2023
Author

CyrusNajmabadi Dec 16, 2023
Collaborator