-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add token usage tracking #947
Conversation
Updated multiple files to include the `Microsoft.KernelMemory.Models` namespace and modified the `GenerateTextAsync` method to return `IAsyncEnumerable<(string? Text, TokenUsage? TokenUsage)>`. Added a new `TokenUsage` class to track token usage information. Updated relevant test files and handlers to accommodate the new return type.
Added a summary documentation comment to the `TokenUsage` class in the `Microsoft.KernelMemory.Models` namespace. This comment describes the purpose of the class, which is to represent the usage of tokens in a request and response cycle. This improves code readability and maintainability by providing context and explanation for the class.
Enhanced token usage logging in AzureOpenAITextGenerator and OpenAITextGenerator with detailed information. Introduced new private fields `_deployment` and `_textModel` in respective classes and updated constructors accordingly. Modified `MemoryAnswer` to handle multiple `TokenUsage` instances. Expanded `TokenUsage` class with additional properties and renamed existing ones for clarity. In `SearchClient`, added `CreatePrompt` method and updated `GenerateAnswer` to use it, ensuring token usage is always populated.
Updated `Microsoft.SemanticKernel` to `1.32.0` in `Directory.Packages.props`. Modified `AzureOpenAITextGenerator.cs` to use `StreamingChatMessageContent` and updated method calls accordingly. Updated `LlamaSharpTextGenerator.cs` to use `ConfigureAwait(false)` in `await foreach` loop. Changed return type in `OnnxTextGenerator.cs` to include `TokenUsage` and updated `yield return` statements. Added comments and updated result handling in `OpenAITextGenerator.cs`. Renamed property and improved string checks in `MemoryAnswer.cs`. Added `CreatePrompt` method and updated `GenerateAnswerTokensAsync` in `AnswerGenerator.cs`. Refactored `SearchClient.cs` to remove old implementation and improve logging. Added `TinyHelpers.AspNetCore.Swashbuckle` package in `KernelMemoryService.csproj`.
Updated using directives to remove Microsoft.SemanticKernel.ChatCompletion and add OpenAI.Chat. Changed result variable type to IAsyncEnumerable<StreamingTextContent>. Updated method call to GetStreamingTextContentsAsync with adjusted parameters. Modified content processing to check and yield x.Text instead of x.Content.
Updated `Directory.Packages.props` to upgrade `Microsoft.SemanticKernel.Connectors.AzureOpenAI` from `1.26.0` to `1.32.0`. Refined logging and token usage assignment in `AzureOpenAITextGenerator` and `OpenAITextGenerator` classes by removing unnecessary null-conditional operators and setting default values. Enhanced `AnswerGenerator` to initialize and update `TokenUsage` during token generation. Added logic in `SearchClient` to merge `TokenUsages` from different parts of the result in two switch cases.
Updated Swashbuckle.AspNetCore from 6.9.0 to 7.2.0. Updated Microsoft.SemanticKernel.Abstractions and Microsoft.SemanticKernel.Connectors.OpenAI from 1.26.0 to 1.32.0.
the last char (token) duplication is likely related to the new streaming logic. Here's another test, using streaming, the last 2 data blocks contain the same token
|
Refactored the GenerateTextAsync method across multiple classes implementing the ITextGenerator interface. Changed the return type from IAsyncEnumerable<(string? Text, TokenUsage? TokenUsage)> to IAsyncEnumerable<TextContent>, where TextContent is a new class encapsulating text and token usage information. Introduced the TextContent class in TextContent.cs to encapsulate generated text and optional token usage information. Modified the TokenUsage class to use DateTimeOffset instead of DateTime for the Timestamp property. Updated the MemoryAnswer class to rename the TokenUsages property to TokenUsage and handle empty strings more appropriately. Updated GenerateTextAsync method implementations in various classes such as CustomModelTextGeneration, AnthropicTextGeneration, AzureOpenAITextGenerator, LlamaSharpTextGenerator, OnnxTextGenerator, OllamaTextGenerator, OpenAITextGenerator, SemanticKernelTextGenerator, and NoTextGenerator to yield instances of TextContent instead of tuples. Updated corresponding test classes such as LlamaSharpTextGeneratorTest, OnnxTextGeneratorTest, AzureOpenAITextGeneratorTest, and OpenAITextGeneratorTest to reflect changes in the GenerateTextAsync method's return type. Refactored the AnswerGenerator class to use the new TextContent class and streamline the generation of answer tokens. Performed minor code cleanups and formatting changes, including removing unused variables and improving code readability.
@dluc I have applied your suggestions and I think I have also solved the duplicate token issue. |
68888bd
to
95f3b4d
Compare
I've done a bit of refactoring to reduce the number of changes. There's something odd with streaming, e.g. the entire MemoryAnswer object, including the nested objects, should be additive, while instead the token usage is not, repeating the same data over each token. I'll take a look at the streaming part as soon as I get some time, I also want to review the changes to SearchClient a bit better. The main job is done though, just a matter of few tweaks, thanks! :-) |
Hi @dluc, @marcominerva Thanks for the updates on this topic. When might this feature become available? Happy new year |
happy new year :-) working on it, I'd like to merge this asap and release a new KM version with all the latest goodies - hopefully won't take long |
@marcominerva I've made a few changes, mostly around how the token usage is streamed. Could you check if the code still works as you expected? Other than that, I think the PR is ready to merge. Thanks! |
@dluc I have just made a test using my environment and everything works correctly. Thank you for your support! |
Hi @dluc |
@nurkmez2 released! - see version 0.96. |
Hi @marcominerva, @dluc Thanks for your help regarding the token usage. Can you explain what you mean by the following fields? For example; Answer.TokenUsage = Total Consumed Token is "serviceTokensIn": 6519 + "serviceTokensOut": 454 = 6973 or not? Thanks |
Motivation and Context (Why the change? What's the scenario?)
This is the new PR for token usage. It replaces #872.
High level description (Approach, Design)
Following the suggestion of the previous PR, this new implementation uses the built-in token counter feature that is provided by models. For the moment, it supports TextGeneration using OpenAI and Azure OpenAI:
@dluc I have a weird behavior that duplicates the last character of the response:
II can't figure out the issue, so I have made the PR despite this. I hope you can help me finding the issue 😄