Choosing the right LLM API isn't just about capability—it's about finding the optimal balance of quality, speed, and cost for your specific use case. With OpenAI, Anthropic, Google, Meta, and Mistral all competing for market share, pricing has become increasingly competitive and complex.
This guide breaks down the current pricing landscape, compares models across different tiers, and helps you make informed decisions about which LLM to use for your application.
Quick Pricing Comparison Table
Here's the current pricing as of January 2026 for the most popular models:
Frontier Models (Highest Capability)
| Model | Provider | Input (per 1M tokens) | Output (per 1M tokens) | Context Window |
|---|---|---|---|---|
| GPT-4 Turbo | OpenAI | $10.00 | $30.00 | 128K |
| GPT-4o | OpenAI | $5.00 | $15.00 | 128K |
| Claude 3 Opus | Anthropic | $15.00 | $75.00 | 200K |
| Gemini 1.5 Pro | $3.50 | $10.50 | 1M | |
| Llama 3 405B | Meta (via API) | $3.00 | $3.00 | 128K |
Mid-Tier Models (Best Value)
| Model | Provider | Input (per 1M tokens) | Output (per 1M tokens) | Context Window |
|---|---|---|---|---|
| Claude 3 Sonnet | Anthropic | $3.00 | $15.00 | 200K |
| Claude 3.5 Sonnet | Anthropic | $3.00 | $15.00 | 200K |
| GPT-4o | OpenAI | $5.00 | $15.00 | 128K |
| Gemini 1.5 Flash | $0.35 | $1.05 | 1M | |
| Llama 3 70B | Meta (via API) | $0.70 | $0.90 | 8K-128K |
| Mistral Large | Mistral | $4.00 | $12.00 | 32K |
Budget Models (Cost-Optimized)
| Model | Provider | Input (per 1M tokens) | Output (per 1M tokens) | Context Window |
|---|---|---|---|---|
| Claude 3 Haiku | Anthropic | $0.25 | $1.25 | 200K |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | 128K |
| GPT-3.5 Turbo | OpenAI | $0.50 | $1.50 | 16K |
| Gemini 1.5 Flash | $0.35 | $1.05 | 1M | |
| Llama 3 8B | Meta (via API) | $0.10 | $0.10 | 8K |
| Mistral 7B | Mistral | $0.25 | $0.25 | 32K |
Cost Analysis by Use Case
Chatbot / Conversational AI
Typical usage: 800 input tokens, 400 output tokens per turn
| Model | Cost per 1000 conversations | Monthly (100K conversations) |
|---|---|---|
| GPT-4o mini | $0.39 | $39 |
| Claude 3 Haiku | $0.70 | $70 |
| GPT-4o | $6.00 | $600 |
| Claude 3 Sonnet | $8.40 | $840 |
Recommendation: GPT-4o mini or Claude 3 Haiku for most chatbots. Upgrade to Sonnet/GPT-4o for complex conversations requiring nuance.
Document Processing / Summarization
Typical usage: 10,000 input tokens (document), 500 output tokens (summary)
| Model | Cost per document | Monthly (10K documents) |
|---|---|---|
| Claude 3 Haiku | $0.003 | $31 |
| GPT-4o mini | $0.002 | $18 |
| Gemini 1.5 Flash | $0.004 | $40 |
| Claude 3 Sonnet | $0.038 | $375 |
Recommendation: GPT-4o mini for basic summarization. Claude 3 Haiku excels at longer documents with its 200K context.
Code Generation / Assistance
Typical usage: 2,000 input tokens (context + prompt), 1,000 output tokens (code)
| Model | Cost per request | Monthly (50K requests) |
|---|---|---|
| GPT-4o mini | $0.001 | $45 |
| Claude 3 Sonnet | $0.021 | $1,050 |
| GPT-4o | $0.025 | $1,250 |
| GPT-4 Turbo | $0.050 | $2,500 |
Recommendation: GPT-4o or Claude 3.5 Sonnet for best code quality. GPT-4o mini works well for simpler tasks.
RAG / Enterprise Search
Typical usage: 5,000 input tokens (retrieved context), 300 output tokens (answer)
| Model | Cost per query | Monthly (500K queries) |
|---|---|---|
| Claude 3 Haiku | $0.002 | $1,000 |
| GPT-4o mini | $0.001 | $500 |
| Gemini 1.5 Flash | $0.002 | $1,000 |
| Claude 3 Sonnet | $0.020 | $10,000 |
Recommendation: GPT-4o mini or Gemini Flash for high-volume RAG. Haiku offers better long-context handling.
Platform Comparison: Direct API vs Cloud Providers
AWS Bedrock Pricing
AWS Bedrock adds convenience but typically costs 10-30% more:
| Model | Bedrock Input | Bedrock Output | vs Direct |
|---|---|---|---|
| Claude 3 Sonnet | $3.00 | $15.00 | Same |
| Claude 3 Haiku | $0.25 | $1.25 | Same |
| Llama 3 70B | $0.99 | $0.99 | +10-40% |
| Mistral Large | $4.00 | $12.00 | Same |
When to use Bedrock:
- You're already in the AWS ecosystem
- Need enterprise security/compliance features
- Want unified billing across models
- Require VPC endpoints and private connectivity
Azure OpenAI Pricing
Azure OpenAI matches OpenAI pricing but adds enterprise features:
| Model | Price | vs OpenAI Direct |
|---|---|---|
| GPT-4 Turbo | $10/$30 | Same |
| GPT-4o | $5/$15 | Same |
| GPT-3.5 Turbo | $0.50/$1.50 | Same |
When to use Azure:
- Enterprise compliance requirements (SOC 2, HIPAA)
- Need content filtering and safety guardrails
- Existing Microsoft/Azure infrastructure
- Regional data residency requirements
Google Vertex AI Pricing
Gemini models on Vertex AI include additional features:
| Model | Vertex Price | vs Direct |
|---|---|---|
| Gemini 1.5 Pro | $3.50/$10.50 | Same |
| Gemini 1.5 Flash | $0.35/$1.05 | Same |
When to use Vertex:
- Need grounding with Google Search
- Want enterprise security features
- Using other GCP services
Open Source Model Economics
Self-Hosting Costs
Running Llama 3 70B yourself requires significant infrastructure:
| Cloud Provider | GPU Instance | Hourly Cost | Tokens/second | Cost per 1M tokens |
|---|---|---|---|---|
| AWS | 2x A100 80GB | $6.50/hour | ~50 | ~$36 |
| GCP | 2x A100 80GB | $5.80/hour | ~50 | ~$32 |
| Lambda Labs | 2x A100 80GB | $2.40/hour | ~50 | ~$13 |
Self-hosting only makes economic sense if:
- You process >10M tokens/hour consistently
- You have GPU engineering expertise
- You need complete data isolation
- You're running fine-tuned models
Inference API Providers
Third-party providers offer open models at competitive rates:
| Provider | Llama 3 70B | Llama 3 8B | Mistral 7B |
|---|---|---|---|
| Together AI | $0.90/$0.90 | $0.20/$0.20 | $0.20/$0.20 |
| Groq | $0.59/$0.79 | $0.05/$0.08 | $0.10/$0.10 |
| Anyscale | $1.00/$1.00 | $0.15/$0.15 | $0.15/$0.15 |
| Fireworks | $0.90/$0.90 | $0.20/$0.20 | $0.20/$0.20 |
Groq stands out for speed (500+ tokens/second) at competitive prices, making it ideal for real-time applications.
Model Selection Decision Tree
What's your primary requirement?
│
├─► Lowest Cost
│ ├─► Simple tasks → GPT-4o mini ($0.15/$0.60)
│ ├─► Need quality → Claude 3 Haiku ($0.25/$1.25)
│ └─► High volume → Llama 3 8B via Groq ($0.05/$0.08)
│
├─► Best Quality (cost secondary)
│ ├─► Coding → GPT-4 Turbo or Claude 3.5 Sonnet
│ ├─► Writing → Claude 3 Opus
│ ├─► Reasoning → GPT-4o or Claude 3 Opus
│ └─► Long documents → Claude 3 Opus (200K) or Gemini 1.5 (1M)
│
├─► Balanced (quality + cost)
│ ├─► General use → Claude 3 Sonnet ($3/$15)
│ ├─► Coding → GPT-4o ($5/$15)
│ └─► Fast responses → Gemini 1.5 Flash ($0.35/$1.05)
│
└─► Special Requirements
├─► Maximum speed → Groq + Llama/Mixtral
├─► Data privacy → Self-hosted Llama or Azure OpenAI
├─► 1M+ context → Gemini 1.5 Pro
└─► Enterprise compliance → Azure/Bedrock/Vertex
Cost Optimization Strategies
1. Model Cascading
Route requests to appropriate models based on complexity:
def select_model(query_complexity: str) -> str:
if query_complexity == "simple":
return "gpt-4o-mini" # $0.15/$0.60
elif query_complexity == "moderate":
return "claude-3-sonnet" # $3/$15
else:
return "gpt-4-turbo" # $10/$30
Potential savings: 40-70% compared to using frontier models for everything.
2. Caching Common Responses
Cache responses for repeated queries:
import hashlib
from functools import lru_cache
@lru_cache(maxsize=10000)
def cached_llm_call(prompt_hash: str) -> str:
# Only call API if not cached
pass
Potential savings: 20-50% depending on query repetition.
3. Prompt Engineering
Optimize prompts to reduce token usage:
| Approach | Before | After | Savings |
|---|---|---|---|
| Concise prompts | 500 tokens | 200 tokens | 60% |
| Structured output | 1000 tokens | 400 tokens | 60% |
| Few-shot → zero-shot | 2000 tokens | 300 tokens | 85% |
4. Batch Processing
Batch multiple items in single requests where possible:
# Instead of 10 separate calls
for item in items:
result = llm.analyze(item)
# Single batched call
results = llm.analyze_batch(items) # Up to 10x cost reduction
Real-World Cost Examples
Startup SaaS (10K MAU)
- Chatbot: 50K conversations/month
- Document processing: 5K docs/month
- Code assistance: 10K requests/month
| Strategy | Model Choice | Monthly Cost |
|---|---|---|
| All GPT-4 | GPT-4 Turbo | $8,500 |
| All Claude Opus | Claude 3 Opus | $12,000 |
| Optimized Mix | Haiku + Sonnet + GPT-4o | $1,200 |
Savings with optimization: 86-90%
Enterprise (100K MAU)
- Customer support: 500K conversations/month
- Document analysis: 50K docs/month
- Internal tools: 100K requests/month
| Strategy | Model Choice | Monthly Cost |
|---|---|---|
| Premium | Claude 3 Sonnet everywhere | $45,000 |
| Optimized | Haiku + Sonnet cascade | $8,000 |
| Budget | GPT-4o mini + selective Sonnet | $4,500 |
Monitoring and Budgeting
Set Up Alerts
Most providers offer spending alerts:
- OpenAI: Usage limits in dashboard
- Anthropic: Monthly spend caps
- AWS Bedrock: CloudWatch alarms on usage
Track Per-Feature Costs
def track_llm_cost(feature: str, input_tokens: int, output_tokens: int, model: str):
cost = calculate_cost(input_tokens, output_tokens, model)
metrics.increment(f"llm.cost.{feature}", cost)
metrics.increment(f"llm.tokens.{feature}.input", input_tokens)
metrics.increment(f"llm.tokens.{feature}.output", output_tokens)
Budget Allocation Framework
| Category | % of LLM Budget | Notes |
|---|---|---|
| User-facing features | 60% | Chatbots, assistants |
| Internal tools | 20% | Code review, analysis |
| Experimentation | 15% | New features, A/B tests |
| Buffer | 5% | Unexpected spikes |
Conclusion
LLM costs vary dramatically—up to 100x—between model tiers. The key to cost-effective AI applications is matching model capability to task requirements:
- Start cheap: Use GPT-4o mini or Claude 3 Haiku for prototyping
- Upgrade selectively: Only use frontier models where quality matters
- Monitor continuously: Track costs per feature, not just total spend
- Optimize iteratively: Prompt engineering can cut costs 50%+ without model changes
Use our LLM Token Counter to estimate costs before committing, and our AWS Bedrock Pricing Calculator for enterprise deployments.
The best model isn't always the most capable one—it's the one that delivers required quality at sustainable cost.