Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add token usage tracking #947

Merged
merged 23 commits into from
Jan 15, 2025
Merged

Add token usage tracking #947

merged 23 commits into from
Jan 15, 2025

Conversation

marcominerva
Copy link
Contributor

@marcominerva marcominerva commented Dec 17, 2024

Motivation and Context (Why the change? What's the scenario?)

This is the new PR for token usage. It replaces #872.

High level description (Approach, Design)

Following the suggestion of the previous PR, this new implementation uses the built-in token counter feature that is provided by models. For the moment, it supports TextGeneration using OpenAI and Azure OpenAI:

image

@dluc I have a weird behavior that duplicates the last character of the response:

image

II can't figure out the issue, so I have made the PR despite this. I hope you can help me finding the issue 😄

Updated multiple files to include the `Microsoft.KernelMemory.Models` namespace and modified the `GenerateTextAsync` method to return `IAsyncEnumerable<(string? Text, TokenUsage? TokenUsage)>`. Added a new `TokenUsage` class to track token usage information. Updated relevant test files and handlers to accommodate the new return type.
Added a summary documentation comment to the `TokenUsage` class in the
`Microsoft.KernelMemory.Models` namespace. This comment describes the
purpose of the class, which is to represent the usage of tokens in a
request and response cycle. This improves code readability and
maintainability by providing context and explanation for the class.
Enhanced token usage logging in AzureOpenAITextGenerator and OpenAITextGenerator with detailed information. Introduced new private fields `_deployment` and `_textModel` in respective classes and updated constructors accordingly. Modified `MemoryAnswer` to handle multiple `TokenUsage` instances. Expanded `TokenUsage` class with additional properties and renamed existing ones for clarity. In `SearchClient`, added `CreatePrompt` method and updated `GenerateAnswer` to use it, ensuring token usage is always populated.
Updated `Microsoft.SemanticKernel` to `1.32.0` in `Directory.Packages.props`. Modified `AzureOpenAITextGenerator.cs` to use `StreamingChatMessageContent` and updated method calls accordingly. Updated `LlamaSharpTextGenerator.cs` to use `ConfigureAwait(false)` in `await foreach` loop. Changed return type in `OnnxTextGenerator.cs` to include `TokenUsage` and updated `yield return` statements. Added comments and updated result handling in `OpenAITextGenerator.cs`. Renamed property and improved string checks in `MemoryAnswer.cs`. Added `CreatePrompt` method and updated `GenerateAnswerTokensAsync` in `AnswerGenerator.cs`. Refactored `SearchClient.cs` to remove old implementation and improve logging. Added `TinyHelpers.AspNetCore.Swashbuckle` package in `KernelMemoryService.csproj`.
Updated using directives to remove Microsoft.SemanticKernel.ChatCompletion and add OpenAI.Chat. Changed result variable type to IAsyncEnumerable<StreamingTextContent>. Updated method call to GetStreamingTextContentsAsync with adjusted parameters. Modified content processing to check and yield x.Text instead of x.Content.
Updated `Directory.Packages.props` to upgrade `Microsoft.SemanticKernel.Connectors.AzureOpenAI` from `1.26.0` to `1.32.0`. Refined logging and token usage assignment in `AzureOpenAITextGenerator` and `OpenAITextGenerator` classes by removing unnecessary null-conditional operators and setting default values. Enhanced `AnswerGenerator` to initialize and update `TokenUsage` during token generation. Added logic in `SearchClient` to merge `TokenUsages` from different parts of the result in two switch cases.
Updated Swashbuckle.AspNetCore from 6.9.0 to 7.2.0.
Updated Microsoft.SemanticKernel.Abstractions and
Microsoft.SemanticKernel.Connectors.OpenAI from 1.26.0 to 1.32.0.
@marcominerva marcominerva requested a review from dluc as a code owner December 17, 2024 17:03
@marcominerva marcominerva marked this pull request as draft December 17, 2024 17:06
@dluc
Copy link
Collaborator

dluc commented Dec 17, 2024

the last char (token) duplication is likely related to the new streaming logic. Here's another test, using streaming, the last 2 data blocks contain the same token

....

data: {"streamState":"append","noResult":false,"text":",","tokenUsages":[{"Timestamp":"2024-12-17T21:38:14.818472Z","ServiceType":null,"ModelType":"TextGeneration","ModelName":null,"tokenizer_tokens_in":15647,"tokenizer_tokens_out":0,"service_tokens_in":null,"service_tokens_out":null,"service_reasoning_tokens":null}]}

data: {"streamState":"append","noResult":false,"text":",","tokenUsages":[{"Timestamp":"2024-12-17T21:38:20.296029Z","ServiceType":"Azure OpenAI","ModelType":"TextGeneration","ModelName":"gpt-4o","tokenizer_tokens_in":15647,"tokenizer_tokens_out":0,"service_tokens_in":15654,"service_tokens_out":300,"service_reasoning_tokens":null}]}

data: [DONE]

Refactored the GenerateTextAsync method across multiple classes implementing the ITextGenerator interface. Changed the return type from IAsyncEnumerable<(string? Text, TokenUsage? TokenUsage)> to IAsyncEnumerable<TextContent>, where TextContent is a new class encapsulating text and token usage information.

Introduced the TextContent class in TextContent.cs to encapsulate generated text and optional token usage information. Modified the TokenUsage class to use DateTimeOffset instead of DateTime for the Timestamp property. Updated the MemoryAnswer class to rename the TokenUsages property to TokenUsage and handle empty strings more appropriately.

Updated GenerateTextAsync method implementations in various classes such as CustomModelTextGeneration, AnthropicTextGeneration, AzureOpenAITextGenerator, LlamaSharpTextGenerator, OnnxTextGenerator, OllamaTextGenerator, OpenAITextGenerator, SemanticKernelTextGenerator, and NoTextGenerator to yield instances of TextContent instead of tuples.

Updated corresponding test classes such as LlamaSharpTextGeneratorTest, OnnxTextGeneratorTest, AzureOpenAITextGeneratorTest, and OpenAITextGeneratorTest to reflect changes in the GenerateTextAsync method's return type. Refactored the AnswerGenerator class to use the new TextContent class and streamline the generation of answer tokens.

Performed minor code cleanups and formatting changes, including removing unused variables and improving code readability.
@marcominerva marcominerva marked this pull request as ready for review December 18, 2024 12:00
@marcominerva
Copy link
Contributor Author

@dluc I have applied your suggestions and I think I have also solved the duplicate token issue.

@dluc dluc force-pushed the token_usage branch 2 times, most recently from 68888bd to 95f3b4d Compare December 19, 2024 00:58
@dluc
Copy link
Collaborator

dluc commented Dec 19, 2024

I've done a bit of refactoring to reduce the number of changes. There's something odd with streaming, e.g. the entire MemoryAnswer object, including the nested objects, should be additive, while instead the token usage is not, repeating the same data over each token.

I'll take a look at the streaming part as soon as I get some time, I also want to review the changes to SearchClient a bit better. The main job is done though, just a matter of few tweaks, thanks! :-)

@dluc dluc added the triage label Jan 8, 2025
@nurkmez2
Copy link

nurkmez2 commented Jan 9, 2025

Hi @dluc, @marcominerva

Thanks for the updates on this topic.

When might this feature become available?

Happy new year

@dluc
Copy link
Collaborator

dluc commented Jan 13, 2025

Hi @dluc, @marcominerva

Thanks for the updates on this topic.

When might this feature become available?

Happy new year

happy new year :-)

working on it, I'd like to merge this asap and release a new KM version with all the latest goodies - hopefully won't take long

dluc
dluc previously approved these changes Jan 14, 2025
@dluc
Copy link
Collaborator

dluc commented Jan 14, 2025

@marcominerva I've made a few changes, mostly around how the token usage is streamed. Could you check if the code still works as you expected?

Other than that, I think the PR is ready to merge. Thanks!

@marcominerva
Copy link
Contributor Author

@dluc I have just made a test using my environment and everything works correctly. Thank you for your support!

@nurkmez2
Copy link

Hi @dluc
When would it be possible to make this feature available?
Thanks

@dluc dluc merged commit dd89a8e into microsoft:main Jan 15, 2025
6 checks passed
@marcominerva marcominerva deleted the token_usage branch January 15, 2025 15:13
@dluc
Copy link
Collaborator

dluc commented Jan 15, 2025

@nurkmez2 released! - see version 0.96.

@nurkmez2
Copy link

Hi @marcominerva, @dluc

Thanks for your help regarding the token usage.

Can you explain what you mean by the following fields?
Which values will provide the total tokens I consume for this request in the Azure model?

For example;

Answer.TokenUsage =
{
"serviceType": "Azure OpenAI",
"modelType": "TextGeneration",
"modelName": "gpt-4o",
"tokenizerTokensIn": 6512,
"tokenizerTokensOut": 454,
"serviceTokensIn": 6519,
"serviceTokensOut": 454,
"serviceReasoningTokens": 0
}

Total Consumed Token is "serviceTokensIn": 6519 + "serviceTokensOut": 454 = 6973 or not?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants