AWS Macie Data Discovery and Protection Guide

Complete guide to discovering and protecting sensitive data with AWS Macie including S3 scanning, PII detection, custom identifiers, and alerts.

10 min readUpdated 2026-01-14

AWS Macie is a data security service that uses machine learning and pattern matching to discover and protect sensitive data stored in Amazon S3. This guide covers enabling Macie, configuring data discovery jobs, creating custom identifiers, and managing findings.

This article is part of our comprehensive Cloud Security Tips for 2026 guide covering essential practices for protecting your cloud environment.

What Macie Detects

CategoryExamples
Personal IdentifiersNames, addresses, phone numbers, dates of birth
Government IDsSSN, passport, driver's license, tax IDs
Financial DataCredit cards, bank accounts, financial statements
CredentialsAWS keys, passwords, API tokens, private keys
HealthcarePHI, medical record numbers, health insurance IDs
CustomOrganization-specific patterns you define

Enable AWS Macie

Using AWS Console

  1. Open the Macie Console
  2. Click Get started
  3. Review the service-linked role permissions
  4. Click Enable Macie

Using AWS CLI

# Enable Macie
aws macie2 enable-macie

# Verify Macie is enabled
aws macie2 get-macie-session

# Check bucket inventory status
aws macie2 describe-buckets \
  --query 'buckets[*].[bucketName,classifiableSizeInBytes]' \
  --output table

Multi-Account Setup

For organizations, use delegated administrator:

# From management account: Enable delegated admin
aws macie2 enable-organization-admin-account \
  --admin-account-id 111122223333

# From delegated admin: Enable for member accounts
aws macie2 create-member \
  --account '{
    "accountId": "444455556666",
    "email": "[email protected]"
  }'

# Auto-enable for new organization accounts
aws macie2 update-organization-configuration \
  --auto-enable

# List member accounts
aws macie2 list-members

Analyze S3 Bucket Security

Macie automatically analyzes S3 bucket security posture:

# Get bucket inventory
aws macie2 describe-buckets \
  --query 'buckets[*].[bucketName,publicAccess.effectivePermission,sharedAccess]' \
  --output table

# Filter for public buckets
aws macie2 describe-buckets \
  --criteria '{
    "publicAccess.effectivePermission": {
      "eq": ["PUBLIC"]
    }
  }'

# Get bucket statistics
aws macie2 get-bucket-statistics \
  --account-id 123456789012

Bucket Security Findings

Finding TypeDescription
Policy:IAMUser/S3BucketPublicBucket has public access via policy
Policy:IAMUser/S3BucketSharedExternallyBucket shared with external accounts
Policy:IAMUser/S3BucketReplicatedExternallyBucket replicates to external account
Policy:IAMUser/S3BlockPublicAccessDisabledPublic access block not enabled

Create Sensitive Data Discovery Job

One-Time Scan

# Create discovery job for specific buckets
aws macie2 create-classification-job \
  --name "PII-Discovery-Production" \
  --description "Scan production buckets for PII" \
  --job-type ONE_TIME \
  --s3-job-definition '{
    "bucketDefinitions": [{
      "accountId": "123456789012",
      "buckets": ["prod-data-bucket", "prod-reports-bucket"]
    }]
  }' \
  --managed-data-identifier-selector ALL \
  --tags Environment=Production

# Check job status
aws macie2 describe-classification-job \
  --job-id abc123def456

Scheduled Scan

# Create weekly scheduled job
aws macie2 create-classification-job \
  --name "Weekly-PII-Scan" \
  --description "Weekly scan for sensitive data" \
  --job-type SCHEDULED \
  --schedule-frequency '{
    "weeklySchedule": {
      "dayOfWeek": "SUNDAY"
    }
  }' \
  --s3-job-definition '{
    "bucketDefinitions": [{
      "accountId": "123456789012",
      "buckets": ["customer-data-bucket"]
    }],
    "scoping": {
      "includes": {
        "and": [{
          "simpleScopeTerm": {
            "comparator": "STARTS_WITH",
            "key": "OBJECT_KEY",
            "values": ["reports/", "exports/"]
          }
        }]
      }
    }
  }' \
  --managed-data-identifier-selector ALL

Scan with Sampling

# Sample 10% of objects (cost optimization)
aws macie2 create-classification-job \
  --name "Sampled-Scan" \
  --job-type ONE_TIME \
  --s3-job-definition '{
    "bucketDefinitions": [{
      "accountId": "123456789012",
      "buckets": ["large-data-lake"]
    }],
    "scoping": {
      "includes": {
        "and": [{
          "simpleScopeTerm": {
            "comparator": "GT",
            "key": "OBJECT_SIZE",
            "values": ["0"]
          }
        }]
      }
    }
  }' \
  --sampling-percentage 10 \
  --managed-data-identifier-selector ALL

Create Custom Data Identifiers

Create custom identifiers for organization-specific data:

# Create custom identifier for employee IDs
aws macie2 create-custom-data-identifier \
  --name "EmployeeID" \
  --description "Internal employee ID format: EMP-XXXXX" \
  --regex "EMP-[0-9]{5}" \
  --keywords '["employee", "emp id", "staff"]' \
  --maximum-match-distance 50 \
  --tags Type=Internal

# Create identifier for internal project codes
aws macie2 create-custom-data-identifier \
  --name "ProjectCode" \
  --description "Internal project code format" \
  --regex "PROJ-[A-Z]{3}-[0-9]{4}" \
  --keywords '["project", "initiative", "program"]'

# List custom identifiers
aws macie2 list-custom-data-identifiers \
  --query 'items[*].[name,id]' \
  --output table

Use Custom Identifiers in Jobs

# Create job with custom and managed identifiers
aws macie2 create-classification-job \
  --name "Complete-Scan" \
  --job-type ONE_TIME \
  --s3-job-definition '{
    "bucketDefinitions": [{
      "accountId": "123456789012",
      "buckets": ["internal-docs"]
    }]
  }' \
  --custom-data-identifier-ids "id1" "id2" \
  --managed-data-identifier-selector ALL

Manage Findings

View Findings

# List all findings
aws macie2 list-findings \
  --sort-criteria '{
    "attributeName": "severity.score",
    "orderBy": "DESC"
  }'

# Get finding details
aws macie2 get-findings \
  --finding-ids "finding-id-1" "finding-id-2"

# Filter for high severity findings
aws macie2 list-findings \
  --finding-criteria '{
    "criterion": {
      "severity.description": {
        "eq": ["High"]
      }
    }
  }'

# Filter by finding type
aws macie2 list-findings \
  --finding-criteria '{
    "criterion": {
      "category": {
        "eq": ["SENSITIVE_DATA"]
      },
      "classificationDetails.result.sensitiveData.detections.type": {
        "eq": ["CREDIT_CARD_NUMBER"]
      }
    }
  }'

Archive Findings

# Archive investigated findings
aws macie2 archive-findings \
  --finding-ids "finding-id-1" "finding-id-2"

# Unarchive findings if needed
aws macie2 unarchive-findings \
  --finding-ids "finding-id-1"

Create Suppression Rules

# Suppress findings for test data bucket
aws macie2 create-findings-filter \
  --name "Suppress-Test-Bucket" \
  --description "Suppress findings from test data bucket" \
  --action ARCHIVE \
  --finding-criteria '{
    "criterion": {
      "resourcesAffected.s3Bucket.name": {
        "eq": ["test-data-bucket"]
      }
    }
  }'

# Suppress specific data type findings
aws macie2 create-findings-filter \
  --name "Suppress-Employee-IDs" \
  --description "Expected employee IDs in HR bucket" \
  --action ARCHIVE \
  --finding-criteria '{
    "criterion": {
      "resourcesAffected.s3Bucket.name": {
        "eq": ["hr-documents"]
      },
      "classificationDetails.result.customDataIdentifiers.detections.name": {
        "eq": ["EmployeeID"]
      }
    }
  }'

# List suppression rules
aws macie2 list-findings-filters

Set Up Notifications

EventBridge Integration

# Create SNS topic
aws sns create-topic --name macie-alerts

# Create EventBridge rule for high severity findings
aws events put-rule \
  --name "MacieHighSeverity" \
  --event-pattern '{
    "source": ["aws.macie"],
    "detail-type": ["Macie Finding"],
    "detail": {
      "severity": {
        "description": ["High", "Critical"]
      }
    }
  }'

# Add SNS target
aws events put-targets \
  --rule MacieHighSeverity \
  --targets Id=1,Arn=arn:aws:sns:us-east-1:123456789012:macie-alerts

# Configure SNS access
aws sns set-topic-attributes \
  --topic-arn arn:aws:sns:us-east-1:123456789012:macie-alerts \
  --attribute-name Policy \
  --attribute-value '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {"Service": "events.amazonaws.com"},
      "Action": "sns:Publish",
      "Resource": "arn:aws:sns:us-east-1:123456789012:macie-alerts"
    }]
  }'

Security Hub Integration

Macie automatically sends findings to Security Hub when both are enabled:

# Verify findings in Security Hub
aws securityhub get-findings \
  --filters '{
    "ProductName": [{"Value": "Macie", "Comparison": "EQUALS"}]
  }'

Export Findings

# Configure findings export to S3
aws macie2 put-findings-publication-configuration \
  --security-hub-configuration '{
    "publishClassificationFindings": true,
    "publishPolicyFindings": true
  }'

# Export finding statistics
aws macie2 get-finding-statistics \
  --group-by SEVERITY_DESCRIPTION \
  --finding-criteria '{
    "criterion": {
      "category": {"eq": ["SENSITIVE_DATA"]}
    }
  }'

# Get aggregated findings by bucket
aws macie2 get-finding-statistics \
  --group-by resourcesAffected.s3Bucket.name \
  --sort-criteria '{
    "attributeName": "count",
    "orderBy": "DESC"
  }'

Cost Optimization

# Exclude buckets from discovery
aws macie2 update-classification-scope \
  --s3 '{
    "excludes": {
      "bucketNames": ["logs-bucket", "cloudtrail-bucket"]
    }
  }'

# Use sampling for large buckets
# Set sampling percentage when creating jobs

# Review job costs
aws macie2 get-usage-statistics \
  --filter-by '[{"comparator":"EQ","key":"accountId","values":["123456789012"]}]' \
  --time-range MONTH_TO_DATE

# Get usage totals by type
aws macie2 get-usage-totals

Best Practices

PracticeRecommendation
CoverageEnable for all accounts via Organizations
SchedulingRun weekly scans on critical data buckets
Custom IdentifiersCreate identifiers for organization-specific data
ExclusionsExclude log and audit buckets to reduce costs
SamplingUse 10-20% sampling for large data lakes
AlertsConfigure notifications for high severity findings
RemediationEstablish SLAs for finding remediation by severity

Remediation Actions

When sensitive data is discovered:

  1. Verify finding - Review the actual data detected
  2. Assess risk - Determine exposure level and compliance impact
  3. Remediate:
    • Delete unnecessary sensitive data
    • Encrypt unencrypted sensitive data
    • Restrict bucket access
    • Move to appropriate storage tier
  4. Document - Record findings and actions taken
  5. Prevent - Update policies to prevent recurrence

Frequently Asked Questions

Find answers to common questions

Macie pricing has three components. S3 bucket evaluation costs $0.10 per bucket per month for metadata analysis. Sensitive data discovery costs $1.00 per GB for the first 50,000 GB, decreasing at higher volumes. There is a 30-day free trial for new accounts. Cost optimization tips include excluding known non-sensitive buckets, using sampling for large objects, and scheduling scans during off-peak times. Use the AWS pricing calculator with your S3 inventory for accurate estimates.

Need Professional Help?

Our team of experts can help you implement and configure these solutions for your organization.