AWS Bedrock is a fully managed service that provides access to foundation models (FMs) from Amazon, Anthropic (Claude), Meta (Llama), Mistral, Cohere, and AI21 Labs through a unified API. It integrates with AWS services like S3, Lambda, and IAM, offering enterprise security and compliance features without managing infrastructure.

Is AWS Bedrock more expensive than direct API access?

Bedrock pricing for Anthropic models (Claude) matches direct Anthropic API pricing. For other models like Llama, Bedrock may cost 10-40% more than alternative providers like Together AI or Groq. The premium pays for AWS integration, enterprise features, and unified billing.

What is Provisioned Throughput in Bedrock?

Provisioned Throughput lets you purchase dedicated model capacity measured in Model Units (MUs). You pay hourly regardless of usage, but get guaranteed throughput without throttling. It's cost-effective for high-volume, consistent workloads where on-demand pricing would be more expensive.

When should I use Provisioned Throughput vs On-Demand?

Use On-Demand for variable workloads, development, and when monthly usage is under ~$5,000. Switch to Provisioned Throughput when you have predictable high volume, need guaranteed capacity, or when on-demand costs exceed the provisioned hourly rate. The break-even is typically 50-70% utilization.

How much does Claude 3 cost on Bedrock?

Claude 3 on Bedrock uses on-demand pricing of $15/$75 per million tokens (Opus), $3/$15 (Sonnet), and $0.25/$1.25 (Haiku) for input/output respectively. This matches direct Anthropic API pricing. Provisioned Throughput pricing varies by commitment term and model.

Does Bedrock charge for fine-tuning?

Yes, custom model training on Bedrock incurs separate charges. Training costs are based on tokens processed during fine-tuning. Hosting fine-tuned models requires Provisioned Throughput (no on-demand option for custom models). Costs vary significantly by base model and training data size.

What's the difference between Bedrock regions and pricing?

Bedrock pricing is consistent across most AWS regions, but model availability varies. US regions (us-east-1, us-west-2) typically have the widest model selection. Some models may have cross-region inference capabilities where requests are routed to available capacity.

Can I use Bedrock with existing AWS credits?

Yes, AWS Bedrock charges apply to your regular AWS bill and can be paid with AWS credits (subject to credit terms). This makes Bedrock attractive for startups with AWS Activate credits or enterprises with committed AWS spending.

AWS Bedrock Pricing Guide: On-Demand vs Provisioned Throughput

AWS Bedrock brings the best foundation models—Claude, Llama, Titan, Mistral, and more—under one managed service. But understanding Bedrock's pricing model is crucial for controlling costs, especially when choosing between on-demand and provisioned throughput options.

This guide breaks down Bedrock pricing, compares it to direct API access, and helps you choose the most cost-effective approach for your workload.

Bedrock Pricing Overview

AWS Bedrock offers two primary pricing models:

On-Demand: Pay per token, no commitment
Provisioned Throughput: Pay hourly for dedicated capacity

Additionally, you may incur costs for:

Custom model training (fine-tuning)
Model evaluation jobs
Knowledge bases and retrieval
Cross-region inference

On-Demand Pricing

On-demand pricing charges per token processed, with separate rates for input and output.

Anthropic Models (Claude)

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude 3 Opus	$15.00	$75.00
Claude 3 Sonnet	$3.00	$15.00
Claude 3.5 Sonnet	$3.00	$15.00
Claude 3 Haiku	$0.25	$1.25

Note: Claude pricing on Bedrock matches direct Anthropic API pricing.

Meta Models (Llama)

Model	Input (per 1M tokens)	Output (per 1M tokens)
Llama 3 70B Instruct	$0.99	$0.99
Llama 3 8B Instruct	$0.22	$0.22
Llama 2 70B	$0.99	$0.99
Llama 2 13B	$0.35	$0.35

Amazon Titan Models

Model	Input (per 1M tokens)	Output (per 1M tokens)
Titan Text Premier	$0.80	$2.40
Titan Text Express	$0.20	$0.60
Titan Text Lite	$0.15	$0.20
Titan Embeddings V2	$0.02	N/A

Mistral Models

Model	Input (per 1M tokens)	Output (per 1M tokens)
Mistral Large	$4.00	$12.00
Mixtral 8x7B	$0.45	$0.70
Mistral 7B	$0.15	$0.20

Other Providers

Model	Input (per 1M tokens)	Output (per 1M tokens)
Cohere Command R+	$3.00	$15.00
Cohere Command R	$0.50	$1.50
AI21 Jamba 1.5 Large	$2.00	$8.00
AI21 Jamba 1.5 Mini	$0.20	$0.40

Provisioned Throughput

Provisioned Throughput guarantees dedicated model capacity for consistent, high-volume workloads.

How It Works

Purchase Model Units (MUs) for specific models
Pay hourly regardless of actual usage
Get guaranteed throughput without throttling
Minimum commitment: 1 month (or no commitment with higher rates)

Pricing Structure

Provisioned Throughput pricing varies by:

Model type
Commitment term (1 month, 6 months, no commitment)
Number of Model Units

Example: Claude 3 Sonnet Provisioned Throughput

Commitment	Hourly per MU	Monthly Cost (1 MU)
No commitment	~$35/hour	~$25,200
1 month	~$28/hour	~$20,160
6 months	~$22/hour	~$15,840

Each Model Unit provides approximately:

30-50 requests per minute (varies by model)
Consistent latency even under load

When Provisioned Makes Sense

Calculate your break-even point:

Monthly On-Demand Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)
Monthly Provisioned Cost = Hourly Rate × 730 hours

Break-even utilization = Provisioned Cost / (On-Demand cost at 100% MU capacity)

General guidance:

< 50% utilization → On-demand cheaper
50-70% utilization → Evaluate carefully
70% utilization → Provisioned likely cheaper

Cost Comparison Examples

Example 1: Customer Service Chatbot

Usage profile:

100,000 conversations/month
800 input tokens, 400 output tokens per conversation
Model: Claude 3 Haiku

On-Demand:

Input:  100,000 × 800 = 80M tokens × $0.25/M = $20
Output: 100,000 × 400 = 40M tokens × $1.25/M = $50
Total: $70/month

Verdict: On-demand is clearly better for this volume.

Example 2: Document Processing Pipeline

Usage profile:

500,000 documents/month
10,000 input tokens, 500 output tokens per document
Model: Claude 3 Sonnet

On-Demand:

Input:  500,000 × 10,000 = 5B tokens × $3/M = $15,000
Output: 500,000 × 500 = 250M tokens × $15/M = $3,750
Total: $18,750/month

Provisioned (1-month commitment):

Required capacity: ~10 MUs (estimated)
Cost: 10 × $20,160 = ~$20,160/month

Verdict: On-demand slightly cheaper, but provisioned provides guaranteed capacity and predictable billing.

Example 3: High-Volume Code Assistant

Usage profile:

2,000,000 requests/month
2,000 input tokens, 1,000 output tokens per request
Model: Claude 3.5 Sonnet

On-Demand:

Input:  2M × 2,000 = 4B tokens × $3/M = $12,000
Output: 2M × 1,000 = 2B tokens × $15/M = $30,000
Total: $42,000/month

Provisioned (6-month commitment):

Required capacity: ~15-20 MUs
Cost: 15 × $15,840 = ~$23,760/month

Verdict: Provisioned saves ~40% at this volume.

Bedrock vs. Direct API Access

Anthropic (Claude)

Aspect	Bedrock	Direct API
Price	Same	Same
Billing	AWS bill	Separate vendor
Enterprise features	VPC, IAM, CloudWatch	Limited
Fine-tuning	Supported	API access

Verdict: Bedrock adds no cost premium for Claude; choose based on operational preferences.

Meta (Llama)

Provider	Llama 3 70B Price
AWS Bedrock	$0.99/$0.99
Together AI	$0.90/$0.90
Groq	$0.59/$0.79
Anyscale	$1.00/$1.00

Verdict: Bedrock costs ~10-70% more than alternatives for Llama. Premium is for AWS integration.

Mistral

Provider	Mistral Large Price
AWS Bedrock	$4.00/$12.00
Mistral Direct	$4.00/$12.00

Verdict: Same pricing; choose based on operational needs.

Additional Bedrock Costs

Knowledge Bases

Bedrock Knowledge Bases for RAG applications incur:

Component	Cost
Embedding (Titan)	$0.02 per 1M tokens
Vector storage	OpenSearch Serverless charges
Retrieval queries	Per-query embedding cost

Model Evaluation

Evaluation Type	Cost
Human evaluation	Per-task pricing
Automatic evaluation	Model inference costs

Custom Models

Fine-tuning costs include:

Phase	Cost
Training	Per-token training cost
Hosting	Provisioned Throughput required
No on-demand	Custom models need dedicated capacity

Optimization Strategies

1. Right-Size Your Model

Don't use Claude 3 Opus for tasks Haiku can handle:

Task	Recommended Model	Cost Difference
Simple classification	Haiku	Baseline
General chat	Sonnet	12x more
Complex reasoning	Opus	60x more

2. Implement Caching

Bedrock supports prompt caching for repeated system prompts:

# Cache system prompts to reduce input tokens
response = bedrock.invoke_model(
    modelId="anthropic.claude-3-sonnet",
    body={
        "anthropic_version": "bedrock-2023-05-31",
        "system": [{"type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral"}}],
        "messages": messages
    }
)

Cached tokens are billed at reduced rates.

3. Use Batch Inference

For non-real-time workloads, batch inference can reduce costs:

# Submit batch job
response = bedrock.create_model_invocation_job(
    modelId="anthropic.claude-3-haiku",
    inputDataConfig={"s3InputDataConfig": {"s3Uri": "s3://bucket/input/"}},
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": "s3://bucket/output/"}}
)

Batch jobs typically cost 50% less than real-time inference.

4. Monitor with CloudWatch

Set up cost monitoring:

# CloudWatch alarm for Bedrock spending
cloudwatch.put_metric_alarm(
    AlarmName='BedrockSpendingAlert',
    MetricName='InvocationCount',
    Namespace='AWS/Bedrock',
    Threshold=100000,
    Period=86400,  # Daily
    EvaluationPeriods=1,
    ComparisonOperator='GreaterThanThreshold'
)

5. Leverage AWS Credits

Bedrock charges apply to your AWS bill, so:

AWS Activate credits (startups) can cover Bedrock
Enterprise agreements may include discounts
Reserved capacity through AWS can reduce costs

Choosing Your Pricing Model

On-Demand If:

Workload is variable or unpredictable
Monthly spend under $5,000
Still in development/testing phase
Need flexibility to switch models

Provisioned Throughput If:

Predictable, high-volume production workload
Need guaranteed capacity (no throttling)
Utilization consistently > 60%
Running custom fine-tuned models

Hybrid Approach:

Provisioned for baseline capacity
On-demand for overflow/spikes
Different models for different tiers

Migration Checklist

Moving to or optimizing Bedrock:

Conclusion

AWS Bedrock simplifies access to multiple AI providers through a unified, enterprise-ready platform. While pricing matches or slightly exceeds direct API access, the value lies in AWS integration, security features, and operational simplicity.

Key takeaways:

Claude pricing matches direct Anthropic - no Bedrock premium
Llama is more expensive on Bedrock than alternatives like Groq
Provisioned Throughput pays off at high, consistent volume (>60% utilization)
Use model cascading - match model capability to task complexity
Leverage AWS ecosystem - credits, monitoring, security

Use our AWS Bedrock Pricing Calculator to estimate costs for your specific workload and compare on-demand vs. provisioned options.

AWS Bedrock Pricing Guide: On-Demand vs Provisioned Throughput

Bedrock Pricing Overview

On-Demand Pricing

Anthropic Models (Claude)

Meta Models (Llama)

Amazon Titan Models

Mistral Models

Other Providers

Provisioned Throughput

How It Works

Pricing Structure

When Provisioned Makes Sense

Cost Comparison Examples

Example 1: Customer Service Chatbot

Example 2: Document Processing Pipeline

Example 3: High-Volume Code Assistant

Bedrock vs. Direct API Access

Anthropic (Claude)

Meta (Llama)

Mistral

Additional Bedrock Costs

Knowledge Bases

Model Evaluation

Custom Models

Optimization Strategies

1. Right-Size Your Model

2. Implement Caching

3. Use Batch Inference

4. Monitor with CloudWatch

5. Leverage AWS Credits

Choosing Your Pricing Model

On-Demand If:

Provisioned Throughput If:

Hybrid Approach:

Migration Checklist

Conclusion

Frequently Asked Questions

Let's turn this knowledge into action

Related Tools

AWS Bedrock Pricing Calculator

LLM Token Counter

Related Articles

LLM API Cost Comparison: GPT-4 vs Claude vs Llama (2026)

Understanding LLM Tokens: How AI Models Count Words

Optimizing Prompts to Reduce Token Usage and Costs