The Terraform Apply Moment of Truth
You've been working on infrastructure changes for hours. Your Terraform plan shows 23 resources will be modified. You type terraform apply and your finger hovers over Enter. A voice in your head asks: "What could possibly go wrong?"
This moment—the gap between terraform plan and terraform apply—is where production incidents are born. A misunderstood change. A missed dependency. An unexpected replacement. And suddenly, your production database is gone, your VPC routing is broken, or your S3 bucket is public.
Understanding blast radius, risk assessment, and how to read Terraform plans thoroughly is essential for any team managing infrastructure as code. Let's learn how to make that moment before hitting Enter a confident one, not a terrifying one.
What is Blast Radius?
Blast radius is the potential scope of impact from a Terraform change. It answers the question: "If this change goes wrong, what else breaks?"
Direct vs Indirect Impact
Direct impact: Resources explicitly changed in the plan
# Changing this database instance
resource "aws_db_instance" "main" {
instance_class = "db.t3.medium" # Was: db.t3.small
}
Indirect impact: Resources affected by dependencies
# These resources depend on the database
resource "aws_lambda_function" "api" {
environment {
DB_HOST = aws_db_instance.main.endpoint # ← Dependency
}
}
resource "aws_ecs_task_definition" "app" {
environment {
DB_HOST = aws_db_instance.main.endpoint # ← Dependency
}
}
# If database endpoint changes during instance class change,
# Lambda and ECS need redeployment
Blast radius: 1 direct change → 2 indirect impacts
Low vs High Blast Radius
Low blast radius example:
Change: Update Lambda function memory from 512MB to 1024MB
Direct impact:
- 1 Lambda function configuration
Indirect impact:
- None (Lambda version increments, but existing apps unaffected)
Risk level: Low
Safe to apply: Yes (with testing)
High blast radius example:
Change: Replace VPC CIDR block from 10.0.0.0/16 to 10.1.0.0/16
Direct impact:
- 1 VPC destroyed and recreated
Indirect impact:
- 5 subnets destroyed and recreated
- 12 security groups destroyed and recreated
- 23 EC2 instances lose connectivity
- 4 RDS databases become unreachable
- 2 load balancers need reconfiguration
- 8 ECS tasks fail health checks
Risk level: Critical
Safe to apply: No (requires migration plan)
Understanding Terraform Actions
Terraform plan output shows different action types, each with different risk profiles:
Create (+)
Risk level: Low to Medium
Description: Adding new resources
+ aws_s3_bucket.logs
bucket = "my-app-logs-2025"
acl = "private"
Risks:
- Resource naming conflicts
- Quota limits exceeded
- Unintended public exposure (if misconfigured)
Review checklist:
☐ Does resource name follow naming conventions?
☐ Are security settings appropriate (private, encrypted)?
☐ Will this exceed any quotas or limits?
☐ Are tags present for cost tracking?
Update (~)
Risk level: Low to High (depends on what's changing)
Description: Modifying existing resources in-place
~ aws_instance.web
instance_type = "t3.small" -> "t3.medium"
Risks:
- Service interruption during update
- Unexpected behavior with new configuration
- Cost increase
Review checklist:
☐ Is change non-disruptive (or acceptable disruption)?
☐ Can change be rolled back easily?
☐ Are dependent resources compatible with change?
☐ Is change tested in non-production first?
Replace (-/+)
Risk level: High to Critical
Description: Destroying then recreating resource (forced replacement)
-/+ aws_db_instance.main (forces replacement)
instance_class = "db.t3.small" -> "db.t3.medium"
Why replacement occurs:
- Attribute requires resource recreation
- Resource ID or ARN will change
- Downstream dependencies may break
Common forced replacements:
| Resource Type | Attribute | Why Replacement Needed |
|---|---|---|
| aws_instance | ami | Different image = new instance |
| aws_db_instance | engine | Can't change database engine in-place |
| aws_vpc | cidr_block | VPC CIDR is immutable |
| aws_subnet | availability_zone | Subnet AZ is immutable |
| aws_s3_bucket | bucket name | Bucket names are immutable |
| aws_iam_role | name | Role ARN includes name |
Review checklist:
☐ Why is replacement required? (Check (forces replacement) reason)
☐ What depends on this resource's ID/ARN?
☐ Is there data that will be lost? (EBS volumes, database data)
☐ Can we create new resource before destroying old? (create_before_destroy)
☐ Do we have backups?
☐ Is rollback plan documented?
Delete (-)
Risk level: Critical
Description: Removing resources permanently
- aws_s3_bucket.old_data
bucket = "deprecated-bucket"
Risks:
- Permanent data loss
- Breaking dependent resources
- Difficult to recover
Review checklist:
☐ Is data backed up?
☐ Are there dependencies in other Terraform states/workspaces?
☐ Are there manual resources depending on this?
☐ Is deletion intentional (not accidental removal from code)?
☐ Should resource be imported elsewhere before deletion?
Risk Scoring Framework
Not all changes are equal. Use this framework to assess risk:
Resource Type Risk
Critical risk (score: 10):
- VPCs and networking (VPC, subnets, route tables)
- Databases (RDS, DynamoDB, DocumentDB)
- State storage (S3 buckets with state files)
- IAM roles and policies (privilege escalation risk)
- Security groups (network exposure)
High risk (score: 7-9):
- Compute instances (EC2, ECS, Lambda)
- Load balancers (ALB, NLB)
- DNS records (Route53)
- Certificates (ACM)
- Encryption keys (KMS)
Medium risk (score: 4-6):
- Monitoring and logs (CloudWatch, S3 logs)
- Caching (ElastiCache, CloudFront)
- Queues (SQS, SNS)
- API Gateway configurations
Low risk (score: 1-3):
- Tags
- CloudWatch dashboards
- Parameter Store values
- S3 bucket policies (non-critical buckets)
Action Type Risk
Delete: ×3 multiplier
Replace: ×2 multiplier
Update: ×1 multiplier
Create: ×0.5 multiplier
Environment Risk
Production: ×2 multiplier
Staging: ×1 multiplier
Development: ×0.5 multiplier
Blast Radius Risk
Affects 20+ resources: +10 points
Affects 10-19 resources: +5 points
Affects 5-9 resources: +3 points
Affects 1-4 resources: +1 point
Overall Risk Score
Risk Score = (Resource Type Score × Action Multiplier × Environment Multiplier) + Blast Radius Points
Example calculation:
Change: Replace production RDS instance (forces replacement)
Resource Type: Database (Critical = 10)
Action: Replace (×2)
Environment: Production (×2)
Blast Radius: 15 dependent resources (+5)
Risk Score = (10 × 2 × 2) + 5 = 45
Risk Level: CRITICAL (score > 40)
→ Requires: Architecture review, runbook, off-hours deployment, rollback plan
Security Risk Detection
Terraform plans can introduce security vulnerabilities. Here are common patterns to detect:
1. Public Network Exposure
Critical: Opening SSH/RDP to 0.0.0.0/0
resource "aws_security_group_rule" "ssh" {
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # ⚠️ CRITICAL: SSH open to entire internet
}
Fix:
resource "aws_security_group_rule" "ssh" {
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"] # ✅ Only internal VPN
}
2. Public S3 Buckets
Critical: Making S3 bucket publicly readable
resource "aws_s3_bucket_acl" "public" {
bucket = aws_s3_bucket.data.id
acl = "public-read" # ⚠️ CRITICAL: All objects publicly readable
}
Fix:
resource "aws_s3_bucket_acl" "private" {
bucket = aws_s3_bucket.data.id
acl = "private" # ✅ Only authorized IAM principals
}
resource "aws_s3_bucket_public_access_block" "block" {
bucket = aws_s3_bucket.data.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
3. Publicly Accessible Databases
Critical: Making RDS instance publicly accessible
resource "aws_db_instance" "main" {
publicly_accessible = true # ⚠️ CRITICAL: Database exposed to internet
# ...
}
Fix:
resource "aws_db_instance" "main" {
publicly_accessible = false # ✅ Only accessible from VPC
# ...
}
4. Disabling Encryption
High: Removing encryption from existing resources
resource "aws_s3_bucket" "data" {
# ⚠️ HIGH: Removing server_side_encryption_configuration
# Previously encrypted, now unencrypted
}
Fix:
resource "aws_s3_bucket_server_side_encryption_configuration" "data" {
bucket = aws_s3_bucket.data.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256" # ✅ Maintain encryption
}
}
}
5. IAM Privilege Escalation
Critical: Granting excessive IAM permissions
resource "aws_iam_policy" "admin" {
policy = jsonencode({
Statement = [{
Effect = "Allow"
Action = "*" # ⚠️ CRITICAL: Full admin access
Resource = "*"
}]
})
}
Fix:
resource "aws_iam_policy" "limited" {
policy = jsonencode({
Statement = [{
Effect = "Allow"
Action = [
"s3:GetObject", # ✅ Specific permissions only
"s3:PutObject"
]
Resource = "arn:aws:s3:::specific-bucket/*"
}]
})
}
Real-World Dangerous Terraform Changes
Let's examine actual dangerous changes and how to handle them:
Scenario 1: VPC CIDR Change
Plan output:
-/+ aws_vpc.main (forces replacement)
cidr_block = "10.0.0.0/16" -> "10.1.0.0/16"
-/+ aws_subnet.private_a (forces replacement)
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24" -> "10.1.1.0/24"
# ... 23 more resources being replaced
Blast radius: 25 resources
Risk level: CRITICAL
Why dangerous:
- All subnets, route tables, security groups destroyed and recreated
- All EC2 instances lose connectivity
- New IP addresses break hardcoded references
- Zero-downtime migration impossible with simple apply
Safe approach:
- Create parallel VPC:
resource "aws_vpc" "new" {
cidr_block = "10.1.0.0/16"
}
resource "aws_vpc_peering_connection" "migration" {
vpc_id = aws_vpc.main.id
peer_vpc_id = aws_vpc.new.id
}
- Migrate workloads incrementally
- Update DNS/load balancers to point to new VPC
- Decommission old VPC after validation
Scenario 2: RDS Instance Replacement
Plan output:
-/+ aws_db_instance.main (forces replacement)
instance_class = "db.t3.small" -> "db.t3.medium"
# Engine version upgrade also forces replacement
Blast radius: 12 application servers depend on this database
Risk level: CRITICAL
Why dangerous:
- Database destroyed before new one created (data loss!)
- New endpoint hostname breaks applications
- Downtime during recreation
Safe approach:
resource "aws_db_instance" "main" {
instance_class = "db.t3.medium"
# Key setting: Create new before destroying old
lifecycle {
create_before_destroy = true
}
# Also critical: Final snapshot before destruction
final_snapshot_identifier = "main-before-upgrade-${timestamp()}"
skip_final_snapshot = false
}
Alternative: Blue-Green Deployment:
# Create new database instance
resource "aws_db_instance" "main_new" {
identifier = "main-new"
instance_class = "db.t3.medium"
# Copy from snapshot of old instance
snapshot_identifier = aws_db_snapshot.main_final.id
}
# Update application to point to new instance
resource "aws_ssm_parameter" "db_endpoint" {
name = "/app/db/endpoint"
value = aws_db_instance.main_new.endpoint # Cutover
}
# After validation, destroy old instance
# resource "aws_db_instance" "main" { ... } # Remove from code
Scenario 3: Security Group Deletion
Plan output:
- aws_security_group.api
# Warning: 15 EC2 instances reference this security group
Blast radius: 15 instances lose security group
Risk level: HIGH to CRITICAL
Why dangerous:
- Instances may become unreachable
- AWS doesn't allow deleting security groups still in use
- Apply will fail partway through
Safe approach:
Step 1: Identify dependencies
aws ec2 describe-instances \
--filters "Name=instance.group-id,Values=sg-12345678" \
--query 'Reservations[].Instances[].InstanceId'
Step 2: Create replacement security group
resource "aws_security_group" "api_v2" {
name = "api-v2"
# ... same rules as old security group
}
Step 3: Update instance references
resource "aws_instance" "api" {
vpc_security_group_ids = [
aws_security_group.api_v2.id # New security group
]
}
Step 4: Apply changes (instances switch to new SG)
Step 5: Delete old security group
# Remove from code:
# resource "aws_security_group" "api" { ... }
Scenario 4: Load Balancer Target Group Changes
Plan output:
-/+ aws_lb_target_group.api (forces replacement)
port = 80 -> 8080
Blast radius: 10 instances, 1 load balancer listener
Risk level: HIGH
Why dangerous:
- Target group ARN changes
- Listener rules break (reference old ARN)
- Traffic stops flowing to backends
- Zero-downtime not possible with simple apply
Safe approach:
# Create new target group
resource "aws_lb_target_group" "api_v2" {
name = "api-v2"
port = 8080
protocol = "HTTP"
vpc_id = aws_vpc.main.id
lifecycle {
create_before_destroy = true
}
}
# Register instances to new target group
resource "aws_lb_target_group_attachment" "api" {
for_each = toset(var.instance_ids)
target_group_arn = aws_lb_target_group.api_v2.arn # New TG
target_id = each.value
port = 8080
}
# Update listener to use new target group
resource "aws_lb_listener_rule" "api" {
listener_arn = aws_lb_listener.main.arn
priority = 100
action {
type = "forward"
target_group_arn = aws_lb_target_group.api_v2.arn # Cutover
}
condition {
path_pattern {
values = ["/api/*"]
}
}
}
# After validation, remove old target group
# resource "aws_lb_target_group" "api" { ... } # Delete
Terraform Plan Review Checklist
Before running terraform apply, systematically review:
1. Understand Every Change
☐ Do I understand why each resource is changing?
☐ Are there unexpected changes? (indicates drift or unintended edits)
☐ Do I see any forced replacements? (Look for (forces replacement))
2. Check for Dangerous Patterns
☐ Any deletions? Are they intentional?
☐ Any forced replacements? What depends on them?
☐ Any 0.0.0.0/0 in security groups?
☐ Any public S3 bucket ACLs?
☐ Any publicly_accessible databases?
☐ Any IAM permission escalations?
☐ Any encryption being removed?
3. Assess Blast Radius
☐ How many resources directly affected?
☐ What depends on changed resources? (Check terraform state list)
☐ Are there dependencies outside this Terraform state?
☐ What breaks if this change fails halfway through?
4. Validate Testing
☐ Was this change tested in non-production first?
☐ Do we have automated tests covering this change?
☐ Has peer review been completed?
☐ Is change documented in pull request?
5. Prepare for Failure
☐ Do we have backups of data that could be lost?
☐ Can we rollback if something goes wrong?
☐ Is rollback plan documented?
☐ Do we have monitoring/alerting for the changed resources?
☐ Is there an incident response runbook?
6. Timing and Communication
☐ Is this a good time to make this change? (Off-peak hours?)
☐ Have stakeholders been notified?
☐ Is there a maintenance window if needed?
☐ Are on-call engineers aware and ready?
Using Terraform JSON Plans for Analysis
Human-readable plan output is great for review, but JSON plans enable automated analysis:
Generate JSON Plan
# Create plan file
terraform plan -out=tfplan
# Convert to JSON
terraform show -json tfplan > plan.json
JSON Plan Structure
{
"format_version": "1.1",
"terraform_version": "1.6.0",
"planned_values": { ... },
"resource_changes": [
{
"address": "aws_instance.web",
"mode": "managed",
"type": "aws_instance",
"name": "web",
"provider_name": "registry.terraform.io/hashicorp/aws",
"change": {
"actions": ["update"],
"before": { "instance_type": "t3.small" },
"after": { "instance_type": "t3.medium" },
"after_unknown": {},
"before_sensitive": {},
"after_sensitive": {},
"replace_paths": []
}
}
],
"configuration": { ... },
"prior_state": { ... }
}
Automated Risk Analysis Script
import json
def analyze_terraform_plan(plan_file):
with open(plan_file) as f:
plan = json.load(f)
risk_score = 0
critical_issues = []
for change in plan['resource_changes']:
resource = change['address']
actions = change['change']['actions']
# Detect dangerous actions
if 'delete' in actions:
risk_score += 10
critical_issues.append(f"⚠️ DELETE: {resource}")
if 'create' in actions and 'delete' in actions:
risk_score += 8
critical_issues.append(f"⚠️ REPLACE: {resource}")
# Detect security issues
if change['type'] == 'aws_security_group_rule':
after = change['change']['after']
if after.get('cidr_blocks') == ['0.0.0.0/0']:
if after.get('from_port') in [22, 3389]:
risk_score += 15
critical_issues.append(
f"🚨 CRITICAL: {resource} opens SSH/RDP to 0.0.0.0/0"
)
if change['type'] == 'aws_s3_bucket_acl':
after = change['change']['after']
if 'public' in after.get('acl', ''):
risk_score += 12
critical_issues.append(
f"🚨 CRITICAL: {resource} makes S3 bucket public"
)
# Generate report
print(f"\n{'='*60}")
print(f"Terraform Plan Risk Analysis")
print(f"{'='*60}")
print(f"Risk Score: {risk_score}")
if risk_score > 40:
print(f"Risk Level: CRITICAL ⛔")
elif risk_score > 20:
print(f"Risk Level: HIGH ⚠️")
elif risk_score > 10:
print(f"Risk Level: MEDIUM ⚡")
else:
print(f"Risk Level: LOW ✅")
if critical_issues:
print(f"\nCritical Issues:")
for issue in critical_issues:
print(f" {issue}")
return risk_score
# Usage
risk_score = analyze_terraform_plan('plan.json')
if risk_score > 40:
print("\n🚨 BLOCKED: Change requires architecture review")
exit(1)
CI/CD Integration
# GitHub Actions example
name: Terraform Plan Review
on: [pull_request]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Terraform Init
run: terraform init
- name: Terraform Plan
run: |
terraform plan -out=tfplan
terraform show -json tfplan > plan.json
- name: Analyze Plan
run: python scripts/analyze_plan.py plan.json
- name: Comment PR
if: always()
uses: actions/github-script@v6
with:
script: |
const fs = require('fs');
const analysis = fs.readFileSync('analysis_report.txt', 'utf8');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Terraform Plan Analysis\n\n${analysis}`
});
- name: Block High-Risk Changes
run: |
RISK_SCORE=$(cat risk_score.txt)
if [ $RISK_SCORE -gt 40 ]; then
echo "❌ Risk score too high: $RISK_SCORE"
echo "Requires manual architecture review"
exit 1
fi
Conclusion
Terraform is powerful, but with great power comes great responsibility. Understanding blast radius, recognizing dangerous changes, and systematically reviewing plans before applying them is essential for maintaining reliable infrastructure.
Key principles:
- Blast radius = Direct changes + Indirect dependencies: Always consider downstream impact
- Forced replacements are high-risk: Resources with (forces replacement) require extra scrutiny
- Deletions are critical-risk: Verify backups and dependencies before deleting resources
- Security patterns are detectable: Look for 0.0.0.0/0, public buckets, privilege escalation
- JSON plans enable automation: Integrate risk analysis into CI/CD pipelines
- Test in non-production first: Never apply untested changes directly to production
- Have a rollback plan: Document how to revert if something goes wrong
The moment before terraform apply should be confident, not terrifying. By understanding risk, analyzing blast radius, and following systematic review processes, you can make infrastructure changes safely and predictably.
Need help analyzing your Terraform plans? Try our Terraform Plan Explainer to automatically detect security issues, calculate blast radius, assess risk scores, and get specific recommendations before applying changes.