Home/Blog/AWS Bedrock Pricing Guide: On-Demand vs Provisioned Throughput
AI & Machine Learning

AWS Bedrock Pricing Guide: On-Demand vs Provisioned Throughput

Complete guide to AWS Bedrock pricing for Claude, Llama, Titan, and Mistral models. Compare on-demand vs provisioned throughput costs and learn when each makes sense.

By Inventive HQ Team
AWS Bedrock Pricing Guide: On-Demand vs Provisioned Throughput

AWS Bedrock brings the best foundation models—Claude, Llama, Titan, Mistral, and more—under one managed service. But understanding Bedrock's pricing model is crucial for controlling costs, especially when choosing between on-demand and provisioned throughput options.

This guide breaks down Bedrock pricing, compares it to direct API access, and helps you choose the most cost-effective approach for your workload.

Bedrock Pricing Overview

AWS Bedrock offers two primary pricing models:

  1. On-Demand: Pay per token, no commitment
  2. Provisioned Throughput: Pay hourly for dedicated capacity

Additionally, you may incur costs for:

  • Custom model training (fine-tuning)
  • Model evaluation jobs
  • Knowledge bases and retrieval
  • Cross-region inference

On-Demand Pricing

On-demand pricing charges per token processed, with separate rates for input and output.

Anthropic Models (Claude)

ModelInput (per 1M tokens)Output (per 1M tokens)
Claude 3 Opus$15.00$75.00
Claude 3 Sonnet$3.00$15.00
Claude 3.5 Sonnet$3.00$15.00
Claude 3 Haiku$0.25$1.25

Note: Claude pricing on Bedrock matches direct Anthropic API pricing.

Meta Models (Llama)

ModelInput (per 1M tokens)Output (per 1M tokens)
Llama 3 70B Instruct$0.99$0.99
Llama 3 8B Instruct$0.22$0.22
Llama 2 70B$0.99$0.99
Llama 2 13B$0.35$0.35

Amazon Titan Models

ModelInput (per 1M tokens)Output (per 1M tokens)
Titan Text Premier$0.80$2.40
Titan Text Express$0.20$0.60
Titan Text Lite$0.15$0.20
Titan Embeddings V2$0.02N/A

Mistral Models

ModelInput (per 1M tokens)Output (per 1M tokens)
Mistral Large$4.00$12.00
Mixtral 8x7B$0.45$0.70
Mistral 7B$0.15$0.20

Other Providers

ModelInput (per 1M tokens)Output (per 1M tokens)
Cohere Command R+$3.00$15.00
Cohere Command R$0.50$1.50
AI21 Jamba 1.5 Large$2.00$8.00
AI21 Jamba 1.5 Mini$0.20$0.40

Provisioned Throughput

Provisioned Throughput guarantees dedicated model capacity for consistent, high-volume workloads.

How It Works

  • Purchase Model Units (MUs) for specific models
  • Pay hourly regardless of actual usage
  • Get guaranteed throughput without throttling
  • Minimum commitment: 1 month (or no commitment with higher rates)

Pricing Structure

Provisioned Throughput pricing varies by:

  • Model type
  • Commitment term (1 month, 6 months, no commitment)
  • Number of Model Units

Example: Claude 3 Sonnet Provisioned Throughput

CommitmentHourly per MUMonthly Cost (1 MU)
No commitment~$35/hour~$25,200
1 month~$28/hour~$20,160
6 months~$22/hour~$15,840

Each Model Unit provides approximately:

  • 30-50 requests per minute (varies by model)
  • Consistent latency even under load

When Provisioned Makes Sense

Calculate your break-even point:

Monthly On-Demand Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)
Monthly Provisioned Cost = Hourly Rate × 730 hours

Break-even utilization = Provisioned Cost / (On-Demand cost at 100% MU capacity)

General guidance:

  • < 50% utilization → On-demand cheaper
  • 50-70% utilization → Evaluate carefully
  • 70% utilization → Provisioned likely cheaper

Cost Comparison Examples

Example 1: Customer Service Chatbot

Usage profile:

  • 100,000 conversations/month
  • 800 input tokens, 400 output tokens per conversation
  • Model: Claude 3 Haiku

On-Demand:

Input:  100,000 × 800 = 80M tokens × $0.25/M = $20
Output: 100,000 × 400 = 40M tokens × $1.25/M = $50
Total: $70/month

Verdict: On-demand is clearly better for this volume.

Example 2: Document Processing Pipeline

Usage profile:

  • 500,000 documents/month
  • 10,000 input tokens, 500 output tokens per document
  • Model: Claude 3 Sonnet

On-Demand:

Input:  500,000 × 10,000 = 5B tokens × $3/M = $15,000
Output: 500,000 × 500 = 250M tokens × $15/M = $3,750
Total: $18,750/month

Provisioned (1-month commitment):

Required capacity: ~10 MUs (estimated)
Cost: 10 × $20,160 = ~$20,160/month

Verdict: On-demand slightly cheaper, but provisioned provides guaranteed capacity and predictable billing.

Example 3: High-Volume Code Assistant

Usage profile:

  • 2,000,000 requests/month
  • 2,000 input tokens, 1,000 output tokens per request
  • Model: Claude 3.5 Sonnet

On-Demand:

Input:  2M × 2,000 = 4B tokens × $3/M = $12,000
Output: 2M × 1,000 = 2B tokens × $15/M = $30,000
Total: $42,000/month

Provisioned (6-month commitment):

Required capacity: ~15-20 MUs
Cost: 15 × $15,840 = ~$23,760/month

Verdict: Provisioned saves ~40% at this volume.

Bedrock vs. Direct API Access

Anthropic (Claude)

AspectBedrockDirect API
PriceSameSame
BillingAWS billSeparate vendor
Enterprise featuresVPC, IAM, CloudWatchLimited
Fine-tuningSupportedAPI access

Verdict: Bedrock adds no cost premium for Claude; choose based on operational preferences.

Meta (Llama)

ProviderLlama 3 70B Price
AWS Bedrock$0.99/$0.99
Together AI$0.90/$0.90
Groq$0.59/$0.79
Anyscale$1.00/$1.00

Verdict: Bedrock costs ~10-70% more than alternatives for Llama. Premium is for AWS integration.

Mistral

ProviderMistral Large Price
AWS Bedrock$4.00/$12.00
Mistral Direct$4.00/$12.00

Verdict: Same pricing; choose based on operational needs.

Additional Bedrock Costs

Knowledge Bases

Bedrock Knowledge Bases for RAG applications incur:

ComponentCost
Embedding (Titan)$0.02 per 1M tokens
Vector storageOpenSearch Serverless charges
Retrieval queriesPer-query embedding cost

Model Evaluation

Evaluation TypeCost
Human evaluationPer-task pricing
Automatic evaluationModel inference costs

Custom Models

Fine-tuning costs include:

PhaseCost
TrainingPer-token training cost
HostingProvisioned Throughput required
No on-demandCustom models need dedicated capacity

Optimization Strategies

1. Right-Size Your Model

Don't use Claude 3 Opus for tasks Haiku can handle:

TaskRecommended ModelCost Difference
Simple classificationHaikuBaseline
General chatSonnet12x more
Complex reasoningOpus60x more

2. Implement Caching

Bedrock supports prompt caching for repeated system prompts:

# Cache system prompts to reduce input tokens
response = bedrock.invoke_model(
    modelId="anthropic.claude-3-sonnet",
    body={
        "anthropic_version": "bedrock-2023-05-31",
        "system": [{"type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral"}}],
        "messages": messages
    }
)

Cached tokens are billed at reduced rates.

3. Use Batch Inference

For non-real-time workloads, batch inference can reduce costs:

# Submit batch job
response = bedrock.create_model_invocation_job(
    modelId="anthropic.claude-3-haiku",
    inputDataConfig={"s3InputDataConfig": {"s3Uri": "s3://bucket/input/"}},
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": "s3://bucket/output/"}}
)

Batch jobs typically cost 50% less than real-time inference.

4. Monitor with CloudWatch

Set up cost monitoring:

# CloudWatch alarm for Bedrock spending
cloudwatch.put_metric_alarm(
    AlarmName='BedrockSpendingAlert',
    MetricName='InvocationCount',
    Namespace='AWS/Bedrock',
    Threshold=100000,
    Period=86400,  # Daily
    EvaluationPeriods=1,
    ComparisonOperator='GreaterThanThreshold'
)

5. Leverage AWS Credits

Bedrock charges apply to your AWS bill, so:

  • AWS Activate credits (startups) can cover Bedrock
  • Enterprise agreements may include discounts
  • Reserved capacity through AWS can reduce costs

Choosing Your Pricing Model

On-Demand If:

  • Workload is variable or unpredictable
  • Monthly spend under $5,000
  • Still in development/testing phase
  • Need flexibility to switch models

Provisioned Throughput If:

  • Predictable, high-volume production workload
  • Need guaranteed capacity (no throttling)
  • Utilization consistently > 60%
  • Running custom fine-tuned models

Hybrid Approach:

  • Provisioned for baseline capacity
  • On-demand for overflow/spikes
  • Different models for different tiers

Migration Checklist

Moving to or optimizing Bedrock:

  • Audit current LLM usage (tokens, requests, models)
  • Map requirements to available Bedrock models
  • Calculate on-demand costs for typical month
  • Estimate provisioned throughput requirements
  • Compare Bedrock vs. direct API costs
  • Set up CloudWatch monitoring and alerts
  • Implement caching for repeated prompts
  • Consider batch inference for async workloads
  • Plan model cascade (cheap → expensive)
  • Test latency and throughput requirements

Conclusion

AWS Bedrock simplifies access to multiple AI providers through a unified, enterprise-ready platform. While pricing matches or slightly exceeds direct API access, the value lies in AWS integration, security features, and operational simplicity.

Key takeaways:

  1. Claude pricing matches direct Anthropic - no Bedrock premium
  2. Llama is more expensive on Bedrock than alternatives like Groq
  3. Provisioned Throughput pays off at high, consistent volume (>60% utilization)
  4. Use model cascading - match model capability to task complexity
  5. Leverage AWS ecosystem - credits, monitoring, security

Use our AWS Bedrock Pricing Calculator to estimate costs for your specific workload and compare on-demand vs. provisioned options.

Frequently Asked Questions

Find answers to common questions

AWS Bedrock is a fully managed service that provides access to foundation models (FMs) from Amazon, Anthropic (Claude), Meta (Llama), Mistral, Cohere, and AI21 Labs through a unified API. It integrates with AWS services like S3, Lambda, and IAM, offering enterprise security and compliance features without managing infrastructure.

Let's turn this knowledge into action

Get a free 30-minute consultation with our experts. We'll help you apply these insights to your specific situation.