Home/Blog/Understanding Terraform Plan Blast Radius and Risk Assessment
Technology

Understanding Terraform Plan Blast Radius and Risk Assessment

Learn how to assess Terraform plan blast radius for safer infrastructure changes.

By InventiveHQ Team
Understanding Terraform Plan Blast Radius and Risk Assessment

The Terraform Apply Moment of Truth

You've been working on infrastructure changes for hours. Your Terraform plan shows 23 resources will be modified. You type terraform apply and your finger hovers over Enter. A voice in your head asks: "What could possibly go wrong?"

This moment—the gap between terraform plan and terraform apply—is where production incidents are born. A misunderstood change. A missed dependency. An unexpected replacement. And suddenly, your production database is gone, your VPC routing is broken, or your S3 bucket is public.

Understanding blast radius, risk assessment, and how to read Terraform plans thoroughly is essential for any team managing infrastructure as code. Let's learn how to make that moment before hitting Enter a confident one, not a terrifying one.

What is Blast Radius?

Blast radius is the potential scope of impact from a Terraform change. It answers the question: "If this change goes wrong, what else breaks?"

Direct vs Indirect Impact

Direct impact: Resources explicitly changed in the plan

# Changing this database instance
resource "aws_db_instance" "main" {
  instance_class = "db.t3.medium"  # Was: db.t3.small
}

Indirect impact: Resources affected by dependencies

# These resources depend on the database
resource "aws_lambda_function" "api" {
  environment {
    DB_HOST = aws_db_instance.main.endpoint  # ← Dependency
  }
}

resource "aws_ecs_task_definition" "app" {
  environment {
    DB_HOST = aws_db_instance.main.endpoint  # ← Dependency
  }
}

# If database endpoint changes during instance class change,
# Lambda and ECS need redeployment

Blast radius: 1 direct change → 2 indirect impacts

Low vs High Blast Radius

Low blast radius example:

Change: Update Lambda function memory from 512MB to 1024MB

Direct impact:
- 1 Lambda function configuration

Indirect impact:
- None (Lambda version increments, but existing apps unaffected)

Risk level: Low
Safe to apply: Yes (with testing)

High blast radius example:

Change: Replace VPC CIDR block from 10.0.0.0/16 to 10.1.0.0/16

Direct impact:
- 1 VPC destroyed and recreated

Indirect impact:
- 5 subnets destroyed and recreated
- 12 security groups destroyed and recreated
- 23 EC2 instances lose connectivity
- 4 RDS databases become unreachable
- 2 load balancers need reconfiguration
- 8 ECS tasks fail health checks

Risk level: Critical
Safe to apply: No (requires migration plan)

Understanding Terraform Actions

Terraform plan output shows different action types, each with different risk profiles:

Create (+)

Risk level: Low to Medium

Description: Adding new resources

+ aws_s3_bucket.logs
    bucket = "my-app-logs-2025"
    acl    = "private"

Risks:

  • Resource naming conflicts
  • Quota limits exceeded
  • Unintended public exposure (if misconfigured)

Review checklist:

☐ Does resource name follow naming conventions?
☐ Are security settings appropriate (private, encrypted)?
☐ Will this exceed any quotas or limits?
☐ Are tags present for cost tracking?

Update (~)

Risk level: Low to High (depends on what's changing)

Description: Modifying existing resources in-place

~ aws_instance.web
    instance_type = "t3.small" -> "t3.medium"

Risks:

  • Service interruption during update
  • Unexpected behavior with new configuration
  • Cost increase

Review checklist:

☐ Is change non-disruptive (or acceptable disruption)?
☐ Can change be rolled back easily?
☐ Are dependent resources compatible with change?
☐ Is change tested in non-production first?

Replace (-/+)

Risk level: High to Critical

Description: Destroying then recreating resource (forced replacement)

-/+ aws_db_instance.main (forces replacement)
      instance_class = "db.t3.small" -> "db.t3.medium"

Why replacement occurs:

  • Attribute requires resource recreation
  • Resource ID or ARN will change
  • Downstream dependencies may break

Common forced replacements:

Resource TypeAttributeWhy Replacement Needed
aws_instanceamiDifferent image = new instance
aws_db_instanceengineCan't change database engine in-place
aws_vpccidr_blockVPC CIDR is immutable
aws_subnetavailability_zoneSubnet AZ is immutable
aws_s3_bucketbucket nameBucket names are immutable
aws_iam_rolenameRole ARN includes name

Review checklist:

☐ Why is replacement required? (Check (forces replacement) reason)
☐ What depends on this resource's ID/ARN?
☐ Is there data that will be lost? (EBS volumes, database data)
☐ Can we create new resource before destroying old? (create_before_destroy)
☐ Do we have backups?
☐ Is rollback plan documented?

Delete (-)

Risk level: Critical

Description: Removing resources permanently

- aws_s3_bucket.old_data
    bucket = "deprecated-bucket"

Risks:

  • Permanent data loss
  • Breaking dependent resources
  • Difficult to recover

Review checklist:

☐ Is data backed up?
☐ Are there dependencies in other Terraform states/workspaces?
☐ Are there manual resources depending on this?
☐ Is deletion intentional (not accidental removal from code)?
☐ Should resource be imported elsewhere before deletion?

Risk Scoring Framework

Not all changes are equal. Use this framework to assess risk:

Resource Type Risk

Critical risk (score: 10):

- VPCs and networking (VPC, subnets, route tables)
- Databases (RDS, DynamoDB, DocumentDB)
- State storage (S3 buckets with state files)
- IAM roles and policies (privilege escalation risk)
- Security groups (network exposure)

High risk (score: 7-9):

- Compute instances (EC2, ECS, Lambda)
- Load balancers (ALB, NLB)
- DNS records (Route53)
- Certificates (ACM)
- Encryption keys (KMS)

Medium risk (score: 4-6):

- Monitoring and logs (CloudWatch, S3 logs)
- Caching (ElastiCache, CloudFront)
- Queues (SQS, SNS)
- API Gateway configurations

Low risk (score: 1-3):

- Tags
- CloudWatch dashboards
- Parameter Store values
- S3 bucket policies (non-critical buckets)

Action Type Risk

Delete: ×3 multiplier
Replace: ×2 multiplier
Update: ×1 multiplier
Create: ×0.5 multiplier

Environment Risk

Production: ×2 multiplier
Staging: ×1 multiplier
Development: ×0.5 multiplier

Blast Radius Risk

Affects 20+ resources: +10 points
Affects 10-19 resources: +5 points
Affects 5-9 resources: +3 points
Affects 1-4 resources: +1 point

Overall Risk Score

Risk Score = (Resource Type Score × Action Multiplier × Environment Multiplier) + Blast Radius Points

Example calculation:

Change: Replace production RDS instance (forces replacement)

Resource Type: Database (Critical = 10)
Action: Replace (×2)
Environment: Production (×2)
Blast Radius: 15 dependent resources (+5)

Risk Score = (10 × 2 × 2) + 5 = 45

Risk Level: CRITICAL (score > 40)
→ Requires: Architecture review, runbook, off-hours deployment, rollback plan

Security Risk Detection

Terraform plans can introduce security vulnerabilities. Here are common patterns to detect:

1. Public Network Exposure

Critical: Opening SSH/RDP to 0.0.0.0/0

resource "aws_security_group_rule" "ssh" {
  type        = "ingress"
  from_port   = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["0.0.0.0/0"]  # ⚠️ CRITICAL: SSH open to entire internet
}

Fix:

resource "aws_security_group_rule" "ssh" {
  type        = "ingress"
  from_port   = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["10.0.0.0/8"]  # ✅ Only internal VPN
}

2. Public S3 Buckets

Critical: Making S3 bucket publicly readable

resource "aws_s3_bucket_acl" "public" {
  bucket = aws_s3_bucket.data.id
  acl    = "public-read"  # ⚠️ CRITICAL: All objects publicly readable
}

Fix:

resource "aws_s3_bucket_acl" "private" {
  bucket = aws_s3_bucket.data.id
  acl    = "private"  # ✅ Only authorized IAM principals
}

resource "aws_s3_bucket_public_access_block" "block" {
  bucket = aws_s3_bucket.data.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

3. Publicly Accessible Databases

Critical: Making RDS instance publicly accessible

resource "aws_db_instance" "main" {
  publicly_accessible = true  # ⚠️ CRITICAL: Database exposed to internet
  # ...
}

Fix:

resource "aws_db_instance" "main" {
  publicly_accessible = false  # ✅ Only accessible from VPC
  # ...
}

4. Disabling Encryption

High: Removing encryption from existing resources

resource "aws_s3_bucket" "data" {
  # ⚠️ HIGH: Removing server_side_encryption_configuration
  # Previously encrypted, now unencrypted
}

Fix:

resource "aws_s3_bucket_server_side_encryption_configuration" "data" {
  bucket = aws_s3_bucket.data.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"  # ✅ Maintain encryption
    }
  }
}

5. IAM Privilege Escalation

Critical: Granting excessive IAM permissions

resource "aws_iam_policy" "admin" {
  policy = jsonencode({
    Statement = [{
      Effect   = "Allow"
      Action   = "*"              # ⚠️ CRITICAL: Full admin access
      Resource = "*"
    }]
  })
}

Fix:

resource "aws_iam_policy" "limited" {
  policy = jsonencode({
    Statement = [{
      Effect = "Allow"
      Action = [
        "s3:GetObject",           # ✅ Specific permissions only
        "s3:PutObject"
      ]
      Resource = "arn:aws:s3:::specific-bucket/*"
    }]
  })
}

Real-World Dangerous Terraform Changes

Let's examine actual dangerous changes and how to handle them:

Scenario 1: VPC CIDR Change

Plan output:

-/+ aws_vpc.main (forces replacement)
      cidr_block = "10.0.0.0/16" -> "10.1.0.0/16"

-/+ aws_subnet.private_a (forces replacement)
      vpc_id = aws_vpc.main.id
      cidr_block = "10.0.1.0/24" -> "10.1.1.0/24"

# ... 23 more resources being replaced

Blast radius: 25 resources

Risk level: CRITICAL

Why dangerous:

  • All subnets, route tables, security groups destroyed and recreated
  • All EC2 instances lose connectivity
  • New IP addresses break hardcoded references
  • Zero-downtime migration impossible with simple apply

Safe approach:

  1. Create parallel VPC:
resource "aws_vpc" "new" {
  cidr_block = "10.1.0.0/16"
}

resource "aws_vpc_peering_connection" "migration" {
  vpc_id        = aws_vpc.main.id
  peer_vpc_id   = aws_vpc.new.id
}
  1. Migrate workloads incrementally
  2. Update DNS/load balancers to point to new VPC
  3. Decommission old VPC after validation

Scenario 2: RDS Instance Replacement

Plan output:

-/+ aws_db_instance.main (forces replacement)
      instance_class = "db.t3.small" -> "db.t3.medium"
      # Engine version upgrade also forces replacement

Blast radius: 12 application servers depend on this database

Risk level: CRITICAL

Why dangerous:

  • Database destroyed before new one created (data loss!)
  • New endpoint hostname breaks applications
  • Downtime during recreation

Safe approach:

resource "aws_db_instance" "main" {
  instance_class = "db.t3.medium"

  # Key setting: Create new before destroying old
  lifecycle {
    create_before_destroy = true
  }

  # Also critical: Final snapshot before destruction
  final_snapshot_identifier = "main-before-upgrade-${timestamp()}"
  skip_final_snapshot       = false
}

Alternative: Blue-Green Deployment:

# Create new database instance
resource "aws_db_instance" "main_new" {
  identifier     = "main-new"
  instance_class = "db.t3.medium"
  # Copy from snapshot of old instance
  snapshot_identifier = aws_db_snapshot.main_final.id
}

# Update application to point to new instance
resource "aws_ssm_parameter" "db_endpoint" {
  name  = "/app/db/endpoint"
  value = aws_db_instance.main_new.endpoint  # Cutover
}

# After validation, destroy old instance
# resource "aws_db_instance" "main" { ... }  # Remove from code

Scenario 3: Security Group Deletion

Plan output:

- aws_security_group.api
    # Warning: 15 EC2 instances reference this security group

Blast radius: 15 instances lose security group

Risk level: HIGH to CRITICAL

Why dangerous:

  • Instances may become unreachable
  • AWS doesn't allow deleting security groups still in use
  • Apply will fail partway through

Safe approach:

Step 1: Identify dependencies

aws ec2 describe-instances \
  --filters "Name=instance.group-id,Values=sg-12345678" \
  --query 'Reservations[].Instances[].InstanceId'

Step 2: Create replacement security group

resource "aws_security_group" "api_v2" {
  name   = "api-v2"
  # ... same rules as old security group
}

Step 3: Update instance references

resource "aws_instance" "api" {
  vpc_security_group_ids = [
    aws_security_group.api_v2.id  # New security group
  ]
}

Step 4: Apply changes (instances switch to new SG)

Step 5: Delete old security group

# Remove from code:
# resource "aws_security_group" "api" { ... }

Scenario 4: Load Balancer Target Group Changes

Plan output:

-/+ aws_lb_target_group.api (forces replacement)
      port = 80 -> 8080

Blast radius: 10 instances, 1 load balancer listener

Risk level: HIGH

Why dangerous:

  • Target group ARN changes
  • Listener rules break (reference old ARN)
  • Traffic stops flowing to backends
  • Zero-downtime not possible with simple apply

Safe approach:

# Create new target group
resource "aws_lb_target_group" "api_v2" {
  name     = "api-v2"
  port     = 8080
  protocol = "HTTP"
  vpc_id   = aws_vpc.main.id

  lifecycle {
    create_before_destroy = true
  }
}

# Register instances to new target group
resource "aws_lb_target_group_attachment" "api" {
  for_each         = toset(var.instance_ids)
  target_group_arn = aws_lb_target_group.api_v2.arn  # New TG
  target_id        = each.value
  port             = 8080
}

# Update listener to use new target group
resource "aws_lb_listener_rule" "api" {
  listener_arn = aws_lb_listener.main.arn
  priority     = 100

  action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.api_v2.arn  # Cutover
  }

  condition {
    path_pattern {
      values = ["/api/*"]
    }
  }
}

# After validation, remove old target group
# resource "aws_lb_target_group" "api" { ... }  # Delete

Terraform Plan Review Checklist

Before running terraform apply, systematically review:

1. Understand Every Change

☐ Do I understand why each resource is changing?
☐ Are there unexpected changes? (indicates drift or unintended edits)
☐ Do I see any forced replacements? (Look for (forces replacement))

2. Check for Dangerous Patterns

☐ Any deletions? Are they intentional?
☐ Any forced replacements? What depends on them?
☐ Any 0.0.0.0/0 in security groups?
☐ Any public S3 bucket ACLs?
☐ Any publicly_accessible databases?
☐ Any IAM permission escalations?
☐ Any encryption being removed?

3. Assess Blast Radius

☐ How many resources directly affected?
☐ What depends on changed resources? (Check terraform state list)
☐ Are there dependencies outside this Terraform state?
☐ What breaks if this change fails halfway through?

4. Validate Testing

☐ Was this change tested in non-production first?
☐ Do we have automated tests covering this change?
☐ Has peer review been completed?
☐ Is change documented in pull request?

5. Prepare for Failure

☐ Do we have backups of data that could be lost?
☐ Can we rollback if something goes wrong?
☐ Is rollback plan documented?
☐ Do we have monitoring/alerting for the changed resources?
☐ Is there an incident response runbook?

6. Timing and Communication

☐ Is this a good time to make this change? (Off-peak hours?)
☐ Have stakeholders been notified?
☐ Is there a maintenance window if needed?
☐ Are on-call engineers aware and ready?

Using Terraform JSON Plans for Analysis

Human-readable plan output is great for review, but JSON plans enable automated analysis:

Generate JSON Plan

# Create plan file
terraform plan -out=tfplan

# Convert to JSON
terraform show -json tfplan > plan.json

JSON Plan Structure

{
  "format_version": "1.1",
  "terraform_version": "1.6.0",
  "planned_values": { ... },
  "resource_changes": [
    {
      "address": "aws_instance.web",
      "mode": "managed",
      "type": "aws_instance",
      "name": "web",
      "provider_name": "registry.terraform.io/hashicorp/aws",
      "change": {
        "actions": ["update"],
        "before": { "instance_type": "t3.small" },
        "after": { "instance_type": "t3.medium" },
        "after_unknown": {},
        "before_sensitive": {},
        "after_sensitive": {},
        "replace_paths": []
      }
    }
  ],
  "configuration": { ... },
  "prior_state": { ... }
}

Automated Risk Analysis Script

import json

def analyze_terraform_plan(plan_file):
    with open(plan_file) as f:
        plan = json.load(f)

    risk_score = 0
    critical_issues = []

    for change in plan['resource_changes']:
        resource = change['address']
        actions = change['change']['actions']

        # Detect dangerous actions
        if 'delete' in actions:
            risk_score += 10
            critical_issues.append(f"⚠️  DELETE: {resource}")

        if 'create' in actions and 'delete' in actions:
            risk_score += 8
            critical_issues.append(f"⚠️  REPLACE: {resource}")

        # Detect security issues
        if change['type'] == 'aws_security_group_rule':
            after = change['change']['after']
            if after.get('cidr_blocks') == ['0.0.0.0/0']:
                if after.get('from_port') in [22, 3389]:
                    risk_score += 15
                    critical_issues.append(
                        f"🚨 CRITICAL: {resource} opens SSH/RDP to 0.0.0.0/0"
                    )

        if change['type'] == 'aws_s3_bucket_acl':
            after = change['change']['after']
            if 'public' in after.get('acl', ''):
                risk_score += 12
                critical_issues.append(
                    f"🚨 CRITICAL: {resource} makes S3 bucket public"
                )

    # Generate report
    print(f"\n{'='*60}")
    print(f"Terraform Plan Risk Analysis")
    print(f"{'='*60}")
    print(f"Risk Score: {risk_score}")

    if risk_score > 40:
        print(f"Risk Level: CRITICAL ⛔")
    elif risk_score > 20:
        print(f"Risk Level: HIGH ⚠️")
    elif risk_score > 10:
        print(f"Risk Level: MEDIUM ⚡")
    else:
        print(f"Risk Level: LOW ✅")

    if critical_issues:
        print(f"\nCritical Issues:")
        for issue in critical_issues:
            print(f"  {issue}")

    return risk_score

# Usage
risk_score = analyze_terraform_plan('plan.json')
if risk_score > 40:
    print("\n🚨 BLOCKED: Change requires architecture review")
    exit(1)

CI/CD Integration

# GitHub Actions example
name: Terraform Plan Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2

      - name: Terraform Init
        run: terraform init

      - name: Terraform Plan
        run: |
          terraform plan -out=tfplan
          terraform show -json tfplan > plan.json

      - name: Analyze Plan
        run: python scripts/analyze_plan.py plan.json

      - name: Comment PR
        if: always()
        uses: actions/github-script@v6
        with:
          script: |
            const fs = require('fs');
            const analysis = fs.readFileSync('analysis_report.txt', 'utf8');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## Terraform Plan Analysis\n\n${analysis}`
            });

      - name: Block High-Risk Changes
        run: |
          RISK_SCORE=$(cat risk_score.txt)
          if [ $RISK_SCORE -gt 40 ]; then
            echo "❌ Risk score too high: $RISK_SCORE"
            echo "Requires manual architecture review"
            exit 1
          fi

Conclusion

Terraform is powerful, but with great power comes great responsibility. Understanding blast radius, recognizing dangerous changes, and systematically reviewing plans before applying them is essential for maintaining reliable infrastructure.

Key principles:

  • Blast radius = Direct changes + Indirect dependencies: Always consider downstream impact
  • Forced replacements are high-risk: Resources with (forces replacement) require extra scrutiny
  • Deletions are critical-risk: Verify backups and dependencies before deleting resources
  • Security patterns are detectable: Look for 0.0.0.0/0, public buckets, privilege escalation
  • JSON plans enable automation: Integrate risk analysis into CI/CD pipelines
  • Test in non-production first: Never apply untested changes directly to production
  • Have a rollback plan: Document how to revert if something goes wrong

The moment before terraform apply should be confident, not terrifying. By understanding risk, analyzing blast radius, and following systematic review processes, you can make infrastructure changes safely and predictably.

Need help analyzing your Terraform plans? Try our Terraform Plan Explainer to automatically detect security issues, calculate blast radius, assess risk scores, and get specific recommendations before applying changes.

Let's turn this knowledge into action

Get a free 30-minute consultation with our experts. We'll help you apply these insights to your specific situation.