AWS Bedrock brings the best foundation models—Claude, Llama, Titan, Mistral, and more—under one managed service. But understanding Bedrock's pricing model is crucial for controlling costs, especially when choosing between on-demand and provisioned throughput options.
This guide breaks down Bedrock pricing, compares it to direct API access, and helps you choose the most cost-effective approach for your workload.
Bedrock Pricing Overview
AWS Bedrock offers two primary pricing models:
- On-Demand: Pay per token, no commitment
- Provisioned Throughput: Pay hourly for dedicated capacity
Additionally, you may incur costs for:
- Custom model training (fine-tuning)
- Model evaluation jobs
- Knowledge bases and retrieval
- Cross-region inference
On-Demand Pricing
On-demand pricing charges per token processed, with separate rates for input and output.
Anthropic Models (Claude)
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude 3 Opus | $15.00 | $75.00 |
| Claude 3 Sonnet | $3.00 | $15.00 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Claude 3 Haiku | $0.25 | $1.25 |
Note: Claude pricing on Bedrock matches direct Anthropic API pricing.
Meta Models (Llama)
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Llama 3 70B Instruct | $0.99 | $0.99 |
| Llama 3 8B Instruct | $0.22 | $0.22 |
| Llama 2 70B | $0.99 | $0.99 |
| Llama 2 13B | $0.35 | $0.35 |
Amazon Titan Models
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Titan Text Premier | $0.80 | $2.40 |
| Titan Text Express | $0.20 | $0.60 |
| Titan Text Lite | $0.15 | $0.20 |
| Titan Embeddings V2 | $0.02 | N/A |
Mistral Models
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Mistral Large | $4.00 | $12.00 |
| Mixtral 8x7B | $0.45 | $0.70 |
| Mistral 7B | $0.15 | $0.20 |
Other Providers
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Cohere Command R+ | $3.00 | $15.00 |
| Cohere Command R | $0.50 | $1.50 |
| AI21 Jamba 1.5 Large | $2.00 | $8.00 |
| AI21 Jamba 1.5 Mini | $0.20 | $0.40 |
Provisioned Throughput
Provisioned Throughput guarantees dedicated model capacity for consistent, high-volume workloads.
How It Works
- Purchase Model Units (MUs) for specific models
- Pay hourly regardless of actual usage
- Get guaranteed throughput without throttling
- Minimum commitment: 1 month (or no commitment with higher rates)
Pricing Structure
Provisioned Throughput pricing varies by:
- Model type
- Commitment term (1 month, 6 months, no commitment)
- Number of Model Units
Example: Claude 3 Sonnet Provisioned Throughput
| Commitment | Hourly per MU | Monthly Cost (1 MU) |
|---|---|---|
| No commitment | ~$35/hour | ~$25,200 |
| 1 month | ~$28/hour | ~$20,160 |
| 6 months | ~$22/hour | ~$15,840 |
Each Model Unit provides approximately:
- 30-50 requests per minute (varies by model)
- Consistent latency even under load
When Provisioned Makes Sense
Calculate your break-even point:
Monthly On-Demand Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)
Monthly Provisioned Cost = Hourly Rate × 730 hours
Break-even utilization = Provisioned Cost / (On-Demand cost at 100% MU capacity)
General guidance:
- < 50% utilization → On-demand cheaper
- 50-70% utilization → Evaluate carefully
-
70% utilization → Provisioned likely cheaper
Cost Comparison Examples
Example 1: Customer Service Chatbot
Usage profile:
- 100,000 conversations/month
- 800 input tokens, 400 output tokens per conversation
- Model: Claude 3 Haiku
On-Demand:
Input: 100,000 × 800 = 80M tokens × $0.25/M = $20
Output: 100,000 × 400 = 40M tokens × $1.25/M = $50
Total: $70/month
Verdict: On-demand is clearly better for this volume.
Example 2: Document Processing Pipeline
Usage profile:
- 500,000 documents/month
- 10,000 input tokens, 500 output tokens per document
- Model: Claude 3 Sonnet
On-Demand:
Input: 500,000 × 10,000 = 5B tokens × $3/M = $15,000
Output: 500,000 × 500 = 250M tokens × $15/M = $3,750
Total: $18,750/month
Provisioned (1-month commitment):
Required capacity: ~10 MUs (estimated)
Cost: 10 × $20,160 = ~$20,160/month
Verdict: On-demand slightly cheaper, but provisioned provides guaranteed capacity and predictable billing.
Example 3: High-Volume Code Assistant
Usage profile:
- 2,000,000 requests/month
- 2,000 input tokens, 1,000 output tokens per request
- Model: Claude 3.5 Sonnet
On-Demand:
Input: 2M × 2,000 = 4B tokens × $3/M = $12,000
Output: 2M × 1,000 = 2B tokens × $15/M = $30,000
Total: $42,000/month
Provisioned (6-month commitment):
Required capacity: ~15-20 MUs
Cost: 15 × $15,840 = ~$23,760/month
Verdict: Provisioned saves ~40% at this volume.
Bedrock vs. Direct API Access
Anthropic (Claude)
| Aspect | Bedrock | Direct API |
|---|---|---|
| Price | Same | Same |
| Billing | AWS bill | Separate vendor |
| Enterprise features | VPC, IAM, CloudWatch | Limited |
| Fine-tuning | Supported | API access |
Verdict: Bedrock adds no cost premium for Claude; choose based on operational preferences.
Meta (Llama)
| Provider | Llama 3 70B Price |
|---|---|
| AWS Bedrock | $0.99/$0.99 |
| Together AI | $0.90/$0.90 |
| Groq | $0.59/$0.79 |
| Anyscale | $1.00/$1.00 |
Verdict: Bedrock costs ~10-70% more than alternatives for Llama. Premium is for AWS integration.
Mistral
| Provider | Mistral Large Price |
|---|---|
| AWS Bedrock | $4.00/$12.00 |
| Mistral Direct | $4.00/$12.00 |
Verdict: Same pricing; choose based on operational needs.
Additional Bedrock Costs
Knowledge Bases
Bedrock Knowledge Bases for RAG applications incur:
| Component | Cost |
|---|---|
| Embedding (Titan) | $0.02 per 1M tokens |
| Vector storage | OpenSearch Serverless charges |
| Retrieval queries | Per-query embedding cost |
Model Evaluation
| Evaluation Type | Cost |
|---|---|
| Human evaluation | Per-task pricing |
| Automatic evaluation | Model inference costs |
Custom Models
Fine-tuning costs include:
| Phase | Cost |
|---|---|
| Training | Per-token training cost |
| Hosting | Provisioned Throughput required |
| No on-demand | Custom models need dedicated capacity |
Optimization Strategies
1. Right-Size Your Model
Don't use Claude 3 Opus for tasks Haiku can handle:
| Task | Recommended Model | Cost Difference |
|---|---|---|
| Simple classification | Haiku | Baseline |
| General chat | Sonnet | 12x more |
| Complex reasoning | Opus | 60x more |
2. Implement Caching
Bedrock supports prompt caching for repeated system prompts:
# Cache system prompts to reduce input tokens
response = bedrock.invoke_model(
modelId="anthropic.claude-3-sonnet",
body={
"anthropic_version": "bedrock-2023-05-31",
"system": [{"type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral"}}],
"messages": messages
}
)
Cached tokens are billed at reduced rates.
3. Use Batch Inference
For non-real-time workloads, batch inference can reduce costs:
# Submit batch job
response = bedrock.create_model_invocation_job(
modelId="anthropic.claude-3-haiku",
inputDataConfig={"s3InputDataConfig": {"s3Uri": "s3://bucket/input/"}},
outputDataConfig={"s3OutputDataConfig": {"s3Uri": "s3://bucket/output/"}}
)
Batch jobs typically cost 50% less than real-time inference.
4. Monitor with CloudWatch
Set up cost monitoring:
# CloudWatch alarm for Bedrock spending
cloudwatch.put_metric_alarm(
AlarmName='BedrockSpendingAlert',
MetricName='InvocationCount',
Namespace='AWS/Bedrock',
Threshold=100000,
Period=86400, # Daily
EvaluationPeriods=1,
ComparisonOperator='GreaterThanThreshold'
)
5. Leverage AWS Credits
Bedrock charges apply to your AWS bill, so:
- AWS Activate credits (startups) can cover Bedrock
- Enterprise agreements may include discounts
- Reserved capacity through AWS can reduce costs
Choosing Your Pricing Model
On-Demand If:
- Workload is variable or unpredictable
- Monthly spend under $5,000
- Still in development/testing phase
- Need flexibility to switch models
Provisioned Throughput If:
- Predictable, high-volume production workload
- Need guaranteed capacity (no throttling)
- Utilization consistently > 60%
- Running custom fine-tuned models
Hybrid Approach:
- Provisioned for baseline capacity
- On-demand for overflow/spikes
- Different models for different tiers
Migration Checklist
Moving to or optimizing Bedrock:
- Audit current LLM usage (tokens, requests, models)
- Map requirements to available Bedrock models
- Calculate on-demand costs for typical month
- Estimate provisioned throughput requirements
- Compare Bedrock vs. direct API costs
- Set up CloudWatch monitoring and alerts
- Implement caching for repeated prompts
- Consider batch inference for async workloads
- Plan model cascade (cheap → expensive)
- Test latency and throughput requirements
Conclusion
AWS Bedrock simplifies access to multiple AI providers through a unified, enterprise-ready platform. While pricing matches or slightly exceeds direct API access, the value lies in AWS integration, security features, and operational simplicity.
Key takeaways:
- Claude pricing matches direct Anthropic - no Bedrock premium
- Llama is more expensive on Bedrock than alternatives like Groq
- Provisioned Throughput pays off at high, consistent volume (>60% utilization)
- Use model cascading - match model capability to task complexity
- Leverage AWS ecosystem - credits, monitoring, security
Use our AWS Bedrock Pricing Calculator to estimate costs for your specific workload and compare on-demand vs. provisioned options.