Together AI API Pricing and Rate Limits


Together AI provides access to several language models, with pricing and rate limits based on usage tiers. These rate limits, such as Requests Per Second (RPS), Tokens Per Second (TPS), and Tokens Per Minute (TPM), help ensure consistent and fair use of the platform.
Rate Limiting Overview
Rate limits are primarily controlled via the following metrics:
Requests Per Second (RPS): Maximum number of requests allowed per second.
Tokens Per Second (TPS): Maximum tokens processed per second.
Tokens Per Minute (TPM): Maximum tokens processed per minute.
HTTP Status Code 429 is used to indicate that rate limits have been exceeded.
TPM (Tokens per Minute):
Definition: The maximum number of tokens (words or parts of words) that can be processed in one minute.
Example: If your model processes up to 2,000,000 tokens per minute (TPM), and each user session uses 5,000 tokens, you can handle 400 sessions per minute.
RPM (Requests per Minute):
Definition: The maximum number of API requests (or messages) that can be sent in one minute.
Example: If your API allows 5,000 requests per minute (RPM), and each session involves 2 requests (one question, one answer), you can handle 2,500 sessions per minute.
Both limits determine how many users your chatbot can handle within a minute based on message frequency (RPM) and message length (TPM).
Pricing Tiers for Embedding Models
Embedding models are priced based on different tiers, each offering specific limits on Requests and Tokens per Minute:
Tier | RPM (Requests per Minute) | TPM (Tokens per Minute) | Max Users per Minute |
Tier 2 ($50) | 5,000 RPM | 2,000,000 TPM | 400 users |
Tier 3 ($100) | 5,000 RPM | 10,000,000 TPM | 2,000 users |
Tier 4 ($250) | 10,000 RPM | 10,000,000 TPM | 2,000 users |
Tier 5 ($1,000) | 10,000 RPM | 20,000,000 TPM | 4,000 users |
These tiers allow for scaling based on user needs, ensuring flexibility in managing embedding models at different price points.
Meta-Llama 3.1-8B Instruct Turbo Model
The Meta-Llama 3.1-8B Instruct Turbo model is one of the available language models on Together AI. This model features advanced transformer architecture, and its tuned versions incorporate:
Supervised Fine-Tuning (SFT)
Reinforcement Learning with Human Feedback (RLHF)
These features help align the model with human preferences for helpfulness and safety, ensuring a higher quality output.
Key Features:
Auto-regressive architecture: Allows for the prediction of the next token in a sequence, making it effective for tasks requiring sequential data processing.
Optimized for inference: With enhancements to ensure faster and more efficient model performance.
Playground Availability: Can be tested on the Together AI Playground for hands-on experience.
Cost of Meta-Llama 3.1-8B Instruct Turbo on HuggingFace
For users looking to deploy the Meta-Llama model via HuggingFace, the cost is calculated based on Tokens Per Million (TPM).
- Price on TogetherAI: $0.18 per million tokens
This pricing is competitive compared to other high-performance models available on the market, making it suitable for various large-scale applications.
Accessing Together AI Documentation and API Endpoints
To access more information about the Together AI rate limits and API documentation, please refer to the following links:
Rate Limits Documentation: https://docs.together.ai/docs/rate-limits
Meta-Llama 3.1-8B Instruct Turbo API: https://api.together.xyz/models/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
Key Details:
10,000 sessions in a month (1 session = interaction by 1 student).
We need to estimate the average number of tokens consumed per session to calculate the total cost.
The price is $0.18 per 1 million tokens.
Assumption:
Manual Token Counting (Approximation):
Here’s an example breakdown:
User input: "How many tokens are used in this chatbot?" (9 words) ≈ 10-12 tokens.
Chatbot response: "This chatbot uses tokens to process words. Each word is broken into chunks." (16 words) ≈ 20-25 tokens.
If the user and chatbot exchange 5 messages during a session, we can calculate:
Average user input: 12 tokens.
Average chatbot response: 25 tokens.
Let’s assume each session uses 5,000 tokens (for simplicity, this can be adjusted based on your data).
Step-by-Step Calculation:
Final Cost for 10,000 Sessions:
If each session uses 5,000 tokens, and you have 10,000 sessions in a month, the total cost would be $9.00.
Adjustments Based on Token Usage:
If the number of tokens per session changes, here’s how the cost changes:
1,000 tokens per session (10,000 sessions): $1.80
2,000 tokens per session (10,000 sessions): $3.60
10,000 tokens per session (10,000 sessions): $18.00
Let me know if you need a more detailed breakdown or if the average tokens per session are different!
Cost Calculation Formula:
Where:
Price Per Million Tokens = $0.18
Tokens Used = Tokens processed within a specific period
Scenario 1: Tier 2 ($50 paid)
Maximum Tokens per Minute (TPM): 2,000,000 TPM
Price per Million Tokens: $0.18
Scenario 2: Tier 3 ($100 paid)
Maximum Tokens per Minute (TPM): 10,000,000 TPM
Price per Million Tokens: $0.18
Scenario 3: Tier 4 ($250 paid)
Maximum Tokens per Minute (TPM): 10,000,000 TPM
Price per Million Tokens: $0.18
Scenario 4: Tier 5 ($1,000 paid)
Maximum Tokens per Minute (TPM): 20,000,000 TPM
Price per Million Tokens: $0.18
Summary of Costs
Tier | Tokens per Minute (TPM) | Cost per Minute | Cost per Hour | Cost per Day |
Tier 2 | 2,000,000 | $0.36 | $21.60 | $518.40 |
Tier 3 | 10,000,000 | $1.80 | $108.00 | $2,592.00 |
Tier 4 | 10,000,000 | $1.80 | $108.00 | $2,592.00 |
Tier 5 | 20,000,000 | $3.60 | $216.00 | $5,184.00 |
This table gives an overview of the costs based on different tiers and continuous usage over various time periods. Let me know if you'd like any further refinements!
Subscribe to my newsletter
Read articles from Muhammad Hassan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Muhammad Hassan
Muhammad Hassan
Hey there! I'm currently working as an Associate DevOps Engineer, and I'm diving into popular DevOps tools like Azure Devops,Linux, Docker, Kubernetes,Terraform and Ansible. I'm also on the learning track with AWS certifications to amp up my cloud game. If you're into tech collaborations and exploring new horizons, let's connect!