đź“Š Efficiently Track Token Usage in Streamed Chat Completions
Tracking token usage can help you optimize API costs and make your application more efficient. Here’s a quick guide to retrieving token usage data from streamed responses using stream_options={"include_usage": True}
.
🔍 Why Token Usage Matters
Each API request consumes tokens (units of text), and the number of tokens used impacts your API costs. By tracking token usage in real-time, you can stay within budget and optimize your requests.
đź› Step 1: Set Up the Chat Completions Request with Stream Options
To enable token usage tracking in streamed responses, configure your request with both stream=True
and stream_options={"include_usage": True}
. Here’s how:
response = client.chat_completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': "What's 1+1? Answer in one word."}],
temperature=0,
stream=True,
stream_options={"include_usage": True} # Enable token tracking for streaming
)
With stream=True
, the response arrives in chunks. The include_usage=True
setting adds token usage stats to the final chunk of the response.
🔄 Step 2: Process the Streamed Response to Access Token Usage
Once you set up the request, handle each chunk to access token usage data at the end. Here’s what each chunk will contain:
for chunk in response:
print(f"choices: {chunk.choices}\nusage: {chunk.usage}")
print("****************")
Intermediate Chunks: Show partial response content in the
choices
field, whileusage
isNone
.Final Chunk: Displays
usage
stats likecompletion_tokens
,prompt_tokens
, andtotal_tokens
, with an emptychoices
field.
🧩 Example Output
choices: [Choice(delta=ChoiceDelta(content='', role='assistant'), finish_reason=None)]
usage: None
****************
choices: [Choice(delta=ChoiceDelta(content='Two'), finish_reason=None)]
usage: None
****************
choices: []
usage: CompletionUsage(completion_tokens=2, prompt_tokens=18, total_tokens=20)
****************
In this example:
Intermediate Chunks contain parts of the response with
usage
asNone
.Final Chunk provides token stats for the full request.
đź’ˇ Key Points to Remember
Token Usage in Final Chunk: Only the last chunk includes
usage
data.Empty
choices
in Final Chunk: Thechoices
field is empty in the last chunk to signify the end of the response.Optimize Costs: Monitoring token usage allows you to track API costs and fine-tune requests for efficiency.
🎉 Wrapping Up
Using stream_options={"include_usage": True}
gives you real-time token usage insights, helping you manage costs and improve performance. This option is a powerful way to keep your application efficient and within budget.
Subscribe to my newsletter
Read articles from Sandeep P S directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by