Together AI provides access to several language models, with pricing and rate limits based on usage tiers. These rate limits, such as Requests Per Second (RPS), Tokens Per Second (TPS), and Tokens Per Minute (TPM), help ensure consistent and fair use of the platform.

Rate Limiting Overview

Rate limits are primarily controlled via the following metrics:

Requests Per Second (RPS): Maximum number of requests allowed per second.
Tokens Per Second (TPS): Maximum tokens processed per second.
Tokens Per Minute (TPM): Maximum tokens processed per minute.

HTTP Status Code 429 is used to indicate that rate limits have been exceeded.

TPM (Tokens per Minute):

Definition: The maximum number of tokens (words or parts of words) that can be processed in one minute.
Example: If your model processes up to 2,000,000 tokens per minute (TPM), and each user session uses 5,000 tokens, you can handle 400 sessions per minute.

RPM (Requests per Minute):

Definition: The maximum number of API requests (or messages) that can be sent in one minute.
Example: If your API allows 5,000 requests per minute (RPM), and each session involves 2 requests (one question, one answer), you can handle 2,500 sessions per minute.

Both limits determine how many users your chatbot can handle within a minute based on message frequency (RPM) and message length (TPM).

Pricing Tiers for Embedding Models

Embedding models are priced based on different tiers, each offering specific limits on Requests and Tokens per Minute:

Tier	RPM (Requests per Minute)	TPM (Tokens per Minute)	Max Users per Minute
Tier 2 ($50)	5,000 RPM	2,000,000 TPM	400 users
Tier 3 ($100)	5,000 RPM	10,000,000 TPM	2,000 users
Tier 4 ($250)	10,000 RPM	10,000,000 TPM	2,000 users
Tier 5 ($1,000)	10,000 RPM	20,000,000 TPM	4,000 users

These tiers allow for scaling based on user needs, ensuring flexibility in managing embedding models at different price points.

Meta-Llama 3.1-8B Instruct Turbo Model

The Meta-Llama 3.1-8B Instruct Turbo model is one of the available language models on Together AI. This model features advanced transformer architecture, and its tuned versions incorporate:

Supervised Fine-Tuning (SFT)
Reinforcement Learning with Human Feedback (RLHF)

These features help align the model with human preferences for helpfulness and safety, ensuring a higher quality output.

Key Features:

Auto-regressive architecture: Allows for the prediction of the next token in a sequence, making it effective for tasks requiring sequential data processing.
Optimized for inference: With enhancements to ensure faster and more efficient model performance.
Playground Availability: Can be tested on the Together AI Playground for hands-on experience.

Cost of Meta-Llama 3.1-8B Instruct Turbo on HuggingFace

For users looking to deploy the Meta-Llama model via HuggingFace, the cost is calculated based on Tokens Per Million (TPM).

Price on TogetherAI: $0.18 per million tokens

This pricing is competitive compared to other high-performance models available on the market, making it suitable for various large-scale applications.

Accessing Together AI Documentation and API Endpoints

To access more information about the Together AI rate limits and API documentation, please refer to the following links:

Rate Limits Documentation: https://docs.together.ai/docs/rate-limits
Meta-Llama 3.1-8B Instruct Turbo API: https://api.together.xyz/models/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo

Key Details:

10,000 sessions in a month (1 session = interaction by 1 student).
We need to estimate the average number of tokens consumed per session to calculate the total cost.
The price is $0.18 per 1 million tokens.

Assumption:

Manual Token Counting (Approximation):

Here’s an example breakdown:

User input: "How many tokens are used in this chatbot?" (9 words) ≈ 10-12 tokens.
Chatbot response: "This chatbot uses tokens to process words. Each word is broken into chunks." (16 words) ≈ 20-25 tokens.

If the user and chatbot exchange 5 messages during a session, we can calculate:

Average user input: 12 tokens.
Average chatbot response: 25 tokens.

Let’s assume each session uses 5,000 tokens (for simplicity, this can be adjusted based on your data).

Step-by-Step Calculation:

Final Cost for 10,000 Sessions:

If each session uses 5,000 tokens, and you have 10,000 sessions in a month, the total cost would be $9.00.

Adjustments Based on Token Usage:

If the number of tokens per session changes, here’s how the cost changes:

1,000 tokens per session (10,000 sessions): $1.80
2,000 tokens per session (10,000 sessions): $3.60
10,000 tokens per session (10,000 sessions): $18.00

Let me know if you need a more detailed breakdown or if the average tokens per session are different!

Cost Calculation Formula:

Where:

Price Per Million Tokens = $0.18
Tokens Used = Tokens processed within a specific period

Scenario 1: Tier 2 ($50 paid)

Maximum Tokens per Minute (TPM): 2,000,000 TPM
Price per Million Tokens: $0.18

Scenario 2: Tier 3 ($100 paid)

Maximum Tokens per Minute (TPM): 10,000,000 TPM
Price per Million Tokens: $0.18

Scenario 3: Tier 4 ($250 paid)

Maximum Tokens per Minute (TPM): 10,000,000 TPM
Price per Million Tokens: $0.18

Scenario 4: Tier 5 ($1,000 paid)

Maximum Tokens per Minute (TPM): 20,000,000 TPM
Price per Million Tokens: $0.18

Summary of Costs

Tier	Tokens per Minute (TPM)	Cost per Minute	Cost per Hour	Cost per Day
Tier 2	2,000,000	$0.36	$21.60	$518.40
Tier 3	10,000,000	$1.80	$108.00	$2,592.00
Tier 4	10,000,000	$1.80	$108.00	$2,592.00
Tier 5	20,000,000	$3.60	$216.00	$5,184.00

This table gives an overview of the costs based on different tiers and continuous usage over various time periods. Let me know if you'd like any further refinements!

Together AI API Pricing and Rate Limits