Add token usage tracking #947

marcominerva · 2024-12-17T17:03:52Z

Motivation and Context (Why the change? What's the scenario?)

This is the new PR for token usage. It replaces #872.

High level description (Approach, Design)

Following the suggestion of the previous PR, this new implementation uses the built-in token counter feature that is provided by models. For the moment, it supports TextGeneration using OpenAI and Azure OpenAI:

@dluc I have a weird behavior that duplicates the last character of the response:

II can't figure out the issue, so I have made the PR despite this. I hope you can help me finding the issue 😄

Updated multiple files to include the `Microsoft.KernelMemory.Models` namespace and modified the `GenerateTextAsync` method to return `IAsyncEnumerable<(string? Text, TokenUsage? TokenUsage)>`. Added a new `TokenUsage` class to track token usage information. Updated relevant test files and handlers to accommodate the new return type.

Added a summary documentation comment to the `TokenUsage` class in the `Microsoft.KernelMemory.Models` namespace. This comment describes the purpose of the class, which is to represent the usage of tokens in a request and response cycle. This improves code readability and maintainability by providing context and explanation for the class.

Enhanced token usage logging in AzureOpenAITextGenerator and OpenAITextGenerator with detailed information. Introduced new private fields `_deployment` and `_textModel` in respective classes and updated constructors accordingly. Modified `MemoryAnswer` to handle multiple `TokenUsage` instances. Expanded `TokenUsage` class with additional properties and renamed existing ones for clarity. In `SearchClient`, added `CreatePrompt` method and updated `GenerateAnswer` to use it, ensuring token usage is always populated.

Updated `Microsoft.SemanticKernel` to `1.32.0` in `Directory.Packages.props`. Modified `AzureOpenAITextGenerator.cs` to use `StreamingChatMessageContent` and updated method calls accordingly. Updated `LlamaSharpTextGenerator.cs` to use `ConfigureAwait(false)` in `await foreach` loop. Changed return type in `OnnxTextGenerator.cs` to include `TokenUsage` and updated `yield return` statements. Added comments and updated result handling in `OpenAITextGenerator.cs`. Renamed property and improved string checks in `MemoryAnswer.cs`. Added `CreatePrompt` method and updated `GenerateAnswerTokensAsync` in `AnswerGenerator.cs`. Refactored `SearchClient.cs` to remove old implementation and improve logging. Added `TinyHelpers.AspNetCore.Swashbuckle` package in `KernelMemoryService.csproj`.

Updated using directives to remove Microsoft.SemanticKernel.ChatCompletion and add OpenAI.Chat. Changed result variable type to IAsyncEnumerable<StreamingTextContent>. Updated method call to GetStreamingTextContentsAsync with adjusted parameters. Modified content processing to check and yield x.Text instead of x.Content.

Updated `Directory.Packages.props` to upgrade `Microsoft.SemanticKernel.Connectors.AzureOpenAI` from `1.26.0` to `1.32.0`. Refined logging and token usage assignment in `AzureOpenAITextGenerator` and `OpenAITextGenerator` classes by removing unnecessary null-conditional operators and setting default values. Enhanced `AnswerGenerator` to initialize and update `TokenUsage` during token generation. Added logic in `SearchClient` to merge `TokenUsages` from different parts of the result in two switch cases.

Updated Swashbuckle.AspNetCore from 6.9.0 to 7.2.0. Updated Microsoft.SemanticKernel.Abstractions and Microsoft.SemanticKernel.Connectors.OpenAI from 1.26.0 to 1.32.0.

dluc · 2024-12-17T21:37:58Z

the last char (token) duplication is likely related to the new streaming logic. Here's another test, using streaming, the last 2 data blocks contain the same token

....

data: {"streamState":"append","noResult":false,"text":",","tokenUsages":[{"Timestamp":"2024-12-17T21:38:14.818472Z","ServiceType":null,"ModelType":"TextGeneration","ModelName":null,"tokenizer_tokens_in":15647,"tokenizer_tokens_out":0,"service_tokens_in":null,"service_tokens_out":null,"service_reasoning_tokens":null}]}

data: {"streamState":"append","noResult":false,"text":",","tokenUsages":[{"Timestamp":"2024-12-17T21:38:20.296029Z","ServiceType":"Azure OpenAI","ModelType":"TextGeneration","ModelName":"gpt-4o","tokenizer_tokens_in":15647,"tokenizer_tokens_out":0,"service_tokens_in":15654,"service_tokens_out":300,"service_reasoning_tokens":null}]}

data: [DONE]

extensions/OpenAI/OpenAI/OpenAITextGenerator.cs

service/Abstractions/AI/ITextGenerator.cs

service/Abstractions/Models/MemoryAnswer.cs

service/Abstractions/Models/TokenUsage.cs

Refactored the GenerateTextAsync method across multiple classes implementing the ITextGenerator interface. Changed the return type from IAsyncEnumerable<(string? Text, TokenUsage? TokenUsage)> to IAsyncEnumerable<TextContent>, where TextContent is a new class encapsulating text and token usage information. Introduced the TextContent class in TextContent.cs to encapsulate generated text and optional token usage information. Modified the TokenUsage class to use DateTimeOffset instead of DateTime for the Timestamp property. Updated the MemoryAnswer class to rename the TokenUsages property to TokenUsage and handle empty strings more appropriately. Updated GenerateTextAsync method implementations in various classes such as CustomModelTextGeneration, AnthropicTextGeneration, AzureOpenAITextGenerator, LlamaSharpTextGenerator, OnnxTextGenerator, OllamaTextGenerator, OpenAITextGenerator, SemanticKernelTextGenerator, and NoTextGenerator to yield instances of TextContent instead of tuples. Updated corresponding test classes such as LlamaSharpTextGeneratorTest, OnnxTextGeneratorTest, AzureOpenAITextGeneratorTest, and OpenAITextGeneratorTest to reflect changes in the GenerateTextAsync method's return type. Refactored the AnswerGenerator class to use the new TextContent class and streamline the generation of answer tokens. Performed minor code cleanups and formatting changes, including removing unused variables and improving code readability.

marcominerva · 2024-12-18T12:04:16Z

@dluc I have applied your suggestions and I think I have also solved the duplicate token issue.

…x typos

dluc · 2024-12-19T01:07:46Z

I've done a bit of refactoring to reduce the number of changes. There's something odd with streaming, e.g. the entire MemoryAnswer object, including the nested objects, should be additive, while instead the token usage is not, repeating the same data over each token.

I'll take a look at the streaming part as soon as I get some time, I also want to review the changes to SearchClient a bit better. The main job is done though, just a matter of few tweaks, thanks! :-)

nurkmez2 · 2025-01-09T14:16:24Z

Hi @dluc, @marcominerva

Thanks for the updates on this topic.

When might this feature become available?

Happy new year

dluc · 2025-01-13T14:27:02Z

Hi @dluc, @marcominerva

Thanks for the updates on this topic.

When might this feature become available?

Happy new year

happy new year :-)

working on it, I'd like to merge this asap and release a new KM version with all the latest goodies - hopefully won't take long

dluc · 2025-01-14T21:25:34Z

@marcominerva I've made a few changes, mostly around how the token usage is streamed. Could you check if the code still works as you expected?

Other than that, I think the PR is ready to merge. Thanks!

marcominerva · 2025-01-15T10:02:27Z

@dluc I have just made a test using my environment and everything works correctly. Thank you for your support!

nurkmez2 · 2025-01-15T12:45:41Z

Hi @dluc
When would it be possible to make this feature available?
Thanks

dluc · 2025-01-15T16:12:03Z

@nurkmez2 released! - see version 0.96.

nurkmez2 · 2025-01-16T16:32:15Z

Hi @marcominerva, @dluc

Thanks for your help regarding the token usage.

Can you explain what you mean by the following fields?
Which values will provide the total tokens I consume for this request in the Azure model?

For example;

Answer.TokenUsage =
{
"serviceType": "Azure OpenAI",
"modelType": "TextGeneration",
"modelName": "gpt-4o",
"tokenizerTokensIn": 6512,
"tokenizerTokensOut": 454,
"serviceTokensIn": 6519,
"serviceTokensOut": 454,
"serviceReasoningTokens": 0
}

Total Consumed Token is "serviceTokensIn": 6519 + "serviceTokensOut": 454 = 6973 or not?

Thanks

marcominerva added 9 commits October 30, 2024 16:24

Merge commit

c4d4cf2

Merge commit

8ae4f61

Update package versions for Swashbuckle and SemanticKernel

d588c2c

Updated Swashbuckle.AspNetCore from 6.9.0 to 7.2.0. Updated Microsoft.SemanticKernel.Abstractions and Microsoft.SemanticKernel.Connectors.OpenAI from 1.26.0 to 1.32.0.

marcominerva requested a review from dluc as a code owner December 17, 2024 17:03

marcominerva marked this pull request as draft December 17, 2024 17:06

Fix build and code style

29a594e

dluc reviewed Dec 17, 2024

View reviewed changes

extensions/OpenAI/OpenAI/OpenAITextGenerator.cs Outdated Show resolved Hide resolved

dluc reviewed Dec 17, 2024

View reviewed changes

service/Abstractions/AI/ITextGenerator.cs Outdated Show resolved Hide resolved

dluc reviewed Dec 17, 2024

View reviewed changes

service/Abstractions/Models/MemoryAnswer.cs Outdated Show resolved Hide resolved

dluc reviewed Dec 17, 2024

View reviewed changes

service/Abstractions/Models/TokenUsage.cs Outdated Show resolved Hide resolved

marcominerva added 3 commits December 18, 2024 11:50

Applied minor formatting and code style changes.

47120f6

Refactor methods and simplify yield return

383d4fd

marcominerva marked this pull request as ready for review December 18, 2024 12:00

marcominerva mentioned this pull request Dec 18, 2024

Add token usage tracking #872

Closed

Merge branch 'main' into token_usage

3410d1c

dluc force-pushed the token_usage branch from 0a0ea99 to 6b68eb3 Compare December 19, 2024 00:47

Rename TextContent to GeneratedTextContent; Undo package upgrades; Fi…

25222c1

…x typos

dluc force-pushed the token_usage branch 2 times, most recently from 68888bd to 95f3b4d Compare December 19, 2024 00:58

Reducing file changes

c1768d8

dluc force-pushed the token_usage branch from 95f3b4d to c1768d8 Compare December 19, 2024 01:00

Undo nuget upgrades

ace7679

dluc and others added 2 commits December 18, 2024 17:09

Fix build

e817730

Merge branch 'main' into token_usage

d610bc0

dluc added the triage label Jan 8, 2025

Merge branch 'main' into token_usage

f0e4def

Add examples, fix JSON serialization, fix streaming

eb68450

dluc previously approved these changes Jan 14, 2025

View reviewed changes

Merge branch 'main' into token_usage

38f0086

Merge branch 'main' into token_usage

8ac5c31

dluc dismissed their stale review via 8ac5c31 January 15, 2025 09:58

dluc approved these changes Jan 15, 2025

View reviewed changes

dluc merged commit dd89a8e into microsoft:main Jan 15, 2025
6 checks passed

marcominerva deleted the token_usage branch January 15, 2025 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add token usage tracking #947

Add token usage tracking #947

marcominerva commented Dec 17, 2024 •

edited

Loading

dluc commented Dec 17, 2024 •

edited

Loading

marcominerva commented Dec 18, 2024

dluc commented Dec 19, 2024

nurkmez2 commented Jan 9, 2025

dluc commented Jan 13, 2025

dluc commented Jan 14, 2025

marcominerva commented Jan 15, 2025

nurkmez2 commented Jan 15, 2025

dluc commented Jan 15, 2025

nurkmez2 commented Jan 16, 2025

Add token usage tracking #947

Add token usage tracking #947

Conversation

marcominerva commented Dec 17, 2024 • edited Loading

Motivation and Context (Why the change? What's the scenario?)

High level description (Approach, Design)

dluc commented Dec 17, 2024 • edited Loading

marcominerva commented Dec 18, 2024

dluc commented Dec 19, 2024

nurkmez2 commented Jan 9, 2025

dluc commented Jan 13, 2025

dluc commented Jan 14, 2025

marcominerva commented Jan 15, 2025

nurkmez2 commented Jan 15, 2025

dluc commented Jan 15, 2025

nurkmez2 commented Jan 16, 2025

marcominerva commented Dec 17, 2024 •

edited

Loading

dluc commented Dec 17, 2024 •

edited

Loading