Amazon Bedrock is AWS's fully managed service for building generative AI applications. Instead of training your own models or managing ML infrastructure, Bedrock gives you API access to leading foundation models from Anthropic, Meta, Mistral, Cohere, and Amazon.
This guide covers what Bedrock offers, how pricing works, and how to get started building AI-powered applications.
What Is AWS Bedrock?
Amazon Bedrock is a serverless service that provides:
- Foundation model access - Claude, Llama, Mistral, and more via API
- Fine-tuning capabilities - Customize models with your data
- RAG support - Connect models to your knowledge bases
- Agents - Build autonomous AI workflows
- Guardrails - Control model outputs for safety
Think of Bedrock as a "model marketplace with infrastructure"—you choose which AI models to use, AWS handles scaling, security, and availability.
Why Use Bedrock Instead of Direct APIs?
You could use Anthropic's API directly, so why go through AWS?
| Factor | Direct API | AWS Bedrock |
|---|---|---|
| Billing | Separate vendor | Consolidated AWS billing |
| Data residency | Varies by provider | Stays in your AWS region |
| VPC integration | Requires configuration | PrivateLink available |
| IAM integration | API keys only | Native IAM policies |
| Multiple models | Multiple accounts/APIs | Single API, many models |
| Compliance | Varies | AWS compliance (HIPAA, SOC2, etc.) |
| Model switching | Code changes | Configuration changes |
Key benefit: If you're already on AWS and need enterprise controls (VPC, IAM, compliance), Bedrock simplifies integration significantly.
Available Foundation Models
Bedrock offers models from multiple providers:
Text Generation Models
| Provider | Model | Context Window | Best For |
|---|---|---|---|
| Anthropic | Claude 3.5 Sonnet | 200K tokens | Complex reasoning, coding |
| Anthropic | Claude 3 Haiku | 200K tokens | Fast, cost-effective tasks |
| Meta | Llama 3.1 70B | 128K tokens | Open-weight alternative |
| Mistral | Mistral Large | 128K tokens | Multilingual, coding |
| Amazon | Titan Text | 8K tokens | Basic text tasks |
| Cohere | Command R+ | 128K tokens | RAG applications |
Image Generation Models
| Provider | Model | Use Case |
|---|---|---|
| Stability AI | SDXL 1.0 | High-quality image generation |
| Amazon | Titan Image Generator | Text-to-image, editing |
Embedding Models
| Provider | Model | Dimensions |
|---|---|---|
| Amazon | Titan Embeddings V2 | 1024 |
| Cohere | Embed | 1024 |
Bedrock Pricing Explained
Bedrock uses token-based pricing—you pay per input and output token processed. Pricing varies significantly by model.
Text Model Pricing (per 1,000 tokens)
| Model | Input Price | Output Price |
|---|---|---|
| Claude 3.5 Sonnet | $0.003 | $0.015 |
| Claude 3 Haiku | $0.00025 | $0.00125 |
| Llama 3.1 70B | $0.00099 | $0.00099 |
| Mistral Large | $0.004 | $0.012 |
| Titan Text Express | $0.0002 | $0.0006 |
Example Cost Calculation
Processing a customer support request (500 input tokens, 200 output tokens) with Claude 3.5 Sonnet:
Input: 500 / 1000 × $0.003 = $0.0015
Output: 200 / 1000 × $0.015 = $0.003
Total per request: $0.0045
1,000 requests/day = $4.50/day = ~$135/month
Provisioned Throughput
For predictable, high-volume workloads, Bedrock offers Provisioned Throughput:
- Commit to model units for 1 or 6 months
- Get guaranteed capacity
- Potentially lower per-token costs at scale
- Prices vary by model and commitment term
Getting Started: Your First Bedrock Call
Prerequisites
- AWS account with Bedrock access enabled
- Request model access in the Bedrock console (some models require approval)
- IAM permissions for
bedrock:InvokeModel
Python Example with Boto3
import boto3
import json
# Create Bedrock runtime client
bedrock = boto3.client(
service_name='bedrock-runtime',
region_name='us-east-1'
)
# Prepare the request for Claude
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "Explain what AWS Bedrock is in 2 sentences."
}
]
})
# Invoke the model
response = bedrock.invoke_model(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
contentType="application/json",
accept="application/json",
body=body
)
# Parse response
response_body = json.loads(response['body'].read())
print(response_body['content'][0]['text'])
Using the Converse API (Recommended)
The Converse API provides a unified interface across all models:
import boto3
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
response = bedrock.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[
{
"role": "user",
"content": [{"text": "What is AWS Bedrock?"}]
}
],
inferenceConfig={
"maxTokens": 1024,
"temperature": 0.7
}
)
print(response['output']['message']['content'][0]['text'])
Benefit: Switch models by changing modelId without changing code structure.
Bedrock Knowledge Bases (RAG)
Knowledge Bases let you connect foundation models to your own data using Retrieval-Augmented Generation (RAG).
How It Works
┌──────────────────────────────────────────────────────────────┐
│ User Question │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Bedrock Knowledge Base │
│ 1. Convert question to embedding │
│ 2. Search vector database for relevant chunks │
│ 3. Pass chunks + question to foundation model │
│ 4. Return grounded answer │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Answer with Citations │
└──────────────────────────────────────────────────────────────┘
Supported Data Sources
- Amazon S3 (PDF, TXT, MD, HTML, DOC, CSV)
- Web crawlers
- Confluence
- Salesforce
- SharePoint
Creating a Knowledge Base
import boto3
bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')
# Create knowledge base
response = bedrock_agent.create_knowledge_base(
name='company-docs-kb',
roleArn='arn:aws:iam::123456789:role/BedrockKBRole',
knowledgeBaseConfiguration={
'type': 'VECTOR',
'vectorKnowledgeBaseConfiguration': {
'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0'
}
},
storageConfiguration={
'type': 'OPENSEARCH_SERVERLESS',
'opensearchServerlessConfiguration': {
'collectionArn': 'arn:aws:aoss:us-east-1:123456789:collection/abc123',
'vectorIndexName': 'bedrock-kb-index',
'fieldMapping': {
'vectorField': 'embedding',
'textField': 'text',
'metadataField': 'metadata'
}
}
}
)
Bedrock Agents
Agents enable foundation models to take actions by connecting to external tools and APIs.
Example: Customer Service Agent
An agent can:
- Look up customer orders in your database
- Check inventory status via API
- Create support tickets
- Send confirmation emails
Agent Architecture
┌─────────────────────────────────────────────────────────────┐
│ User Input │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Bedrock Agent │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Foundation Model (Claude) │ │
│ │ Reasons about what actions to take │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Action Group │ │ Action Group │ │ Knowledge │ │
│ │ (Lambda) │ │ (API Schema) │ │ Base │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Final Response │
└─────────────────────────────────────────────────────────────┘
Guardrails: Control Model Outputs
Guardrails let you define policies to filter harmful content and protect sensitive data.
Guardrail Capabilities
| Feature | Description |
|---|---|
| Content filters | Block hate, violence, sexual content, etc. |
| Denied topics | Prevent discussion of specific topics |
| Word filters | Block specific words or phrases |
| PII detection | Mask or block personal information |
| Contextual grounding | Reduce hallucinations with source verification |
Creating a Guardrail
import boto3
bedrock = boto3.client('bedrock', region_name='us-east-1')
response = bedrock.create_guardrail(
name='customer-support-guardrail',
description='Guardrail for customer support chatbot',
contentPolicyConfig={
'filtersConfig': [
{'type': 'HATE', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
{'type': 'VIOLENCE', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
]
},
sensitiveInformationPolicyConfig={
'piiEntitiesConfig': [
{'type': 'CREDIT_DEBIT_CARD_NUMBER', 'action': 'BLOCK'},
{'type': 'US_SOCIAL_SECURITY_NUMBER', 'action': 'BLOCK'},
]
},
blockedInputMessaging='I cannot process this request.',
blockedOutputsMessaging='I cannot provide this information.'
)
Model Fine-Tuning
Bedrock supports fine-tuning select models with your own data to improve performance on specific tasks.
Supported Models for Fine-Tuning
- Amazon Titan Text
- Cohere Command
- Meta Llama 2 (select variants)
Fine-Tuning Process
- Prepare training data - JSONL format with prompt/completion pairs
- Upload to S3 - Training data in your bucket
- Create fine-tuning job - Specify base model and hyperparameters
- Deploy custom model - Use via Provisioned Throughput
Training Data Format
{"prompt": "Summarize this support ticket:", "completion": "Customer reports login issue..."}
{"prompt": "Summarize this support ticket:", "completion": "User cannot reset password..."}
Best Practices
1. Start with Smaller Models
Use Claude 3 Haiku or Titan Text for development and testing. Move to larger models only when needed for production quality.
2. Implement Caching
Cache common responses to reduce costs:
import hashlib
def get_cached_or_call(prompt, cache):
cache_key = hashlib.md5(prompt.encode()).hexdigest()
if cache_key in cache:
return cache[cache_key]
response = bedrock.converse(...)
cache[cache_key] = response
return response
3. Use Streaming for Long Responses
response = bedrock.converse_stream(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": [{"text": "Write a long story"}]}]
)
for event in response['stream']:
if 'contentBlockDelta' in event:
print(event['contentBlockDelta']['delta']['text'], end='')
4. Monitor Costs
Set up CloudWatch alarms for Bedrock metrics:
InvocationCount- Track usage volumeInvocationLatency- Monitor response timesTokenCount- Watch token consumption
5. Use VPC Endpoints for Security
aws ec2 create-vpc-endpoint \
--vpc-id vpc-123456 \
--service-name com.amazonaws.us-east-1.bedrock-runtime \
--vpc-endpoint-type Interface
Bedrock vs OpenAI vs Direct Anthropic
| Factor | AWS Bedrock | OpenAI API | Anthropic Direct |
|---|---|---|---|
| Model variety | Multiple providers | OpenAI only | Claude only |
| AWS integration | Native | Manual | Manual |
| Enterprise compliance | Strong | Developing | Developing |
| Pricing | Comparable | Comparable | Often cheaper |
| Fine-tuning | Limited models | GPT-3.5/4 | Not available |
| Setup complexity | AWS knowledge needed | Simple API key | Simple API key |
Getting Started Checklist
- Enable Bedrock in your AWS account
- Request model access for the models you need
- Set up IAM permissions for your users/services
- Start with Converse API for model-agnostic code
- Test with Haiku/Titan before using expensive models
- Add Guardrails for production deployments
- Monitor costs with CloudWatch and billing alerts
AWS Bedrock lowers the barrier to building production AI applications while providing enterprise-grade security and compliance. Start experimenting with the playground in the AWS console, then move to the API for production workloads.
