What types of sensitive data can Cloud DLP detect?

Cloud DLP can detect over 150 built-in InfoTypes including personally identifiable information (SSN, passport numbers, driver's licenses), financial data (credit card numbers, bank accounts, IBANs), healthcare data (medical record numbers, DEA numbers), credentials (API keys, passwords, private keys), and custom patterns you define with regular expressions or dictionaries. It supports detection in text, images, and structured data.

How much does Cloud DLP cost?

Cloud DLP pricing is based on data volume processed. Inspection costs $1-3 per GB depending on the number of InfoTypes scanned. De-identification costs $1 per GB. Storage scanning (BigQuery, Cloud Storage) is billed per GB scanned. There is a free tier of 1 GB per month for content inspection and 1 GB for storage inspection. For large datasets, use sampling to control costs.

Can Cloud DLP scan data in BigQuery tables?

Yes, Cloud DLP integrates natively with BigQuery. You can run inspection jobs on entire tables or specific columns, automatically scan new data with scheduled jobs, and store findings in BigQuery for analysis. DLP can also create de-identified copies of tables, replacing or masking sensitive values while preserving data utility for analytics.

GCP Data Loss Prevention (DLP) API Guide

Data Loss Prevention (DLP) is critical for protecting sensitive information across your cloud environment. Google Cloud DLP API provides automated discovery, classification, and protection of sensitive data in Cloud Storage, BigQuery, Datastore, and custom applications.

This guide covers configuring InfoTypes, running inspection jobs, implementing de-identification techniques, and integrating DLP with your data pipeline. For comprehensive cloud security practices, see our 30 Cloud Security Tips for 2026 guide.

GitHub Repository: All scripts and configurations from this guide are available at github.com/InventiveHQ/gcp-dlp-automation. Clone the repo to get started quickly.

Prerequisites

DLP Administrator role for managing DLP resources
DLP Jobs Editor role for creating inspection jobs
Cloud DLP API enabled
gcloud CLI installed

Enable Cloud DLP API

# Enable the DLP API
gcloud services enable dlp.googleapis.com

# Verify it's enabled
gcloud services list --enabled | grep dlp

Step 1: Understand InfoTypes

InfoTypes define the categories of sensitive data DLP can detect:

Built-in InfoTypes

# List all available InfoTypes
gcloud dlp info-types list --filter="supportedBy=INSPECT"

# Common InfoTypes:
# PERSON_NAME - Names
# EMAIL_ADDRESS - Email addresses
# PHONE_NUMBER - Phone numbers
# CREDIT_CARD_NUMBER - Credit card numbers
# US_SOCIAL_SECURITY_NUMBER - SSN
# US_PASSPORT - Passport numbers
# DATE_OF_BIRTH - Birth dates
# IP_ADDRESS - IPv4/IPv6 addresses
# GCP_CREDENTIALS - API keys, service account keys

Create Custom InfoType

# Create custom InfoType for employee IDs (e.g., EMP-12345)
cat > custom-infotype.json << EOF
{
  "customInfoTypes": [
    {
      "infoType": { "name": "EMPLOYEE_ID" },
      "regex": { "pattern": "EMP-[0-9]{5}" },
      "likelihood": "LIKELY"
    }
  ]
}
EOF

# Create dictionary-based InfoType for internal project codes
cat > dictionary-infotype.json << EOF
{
  "customInfoTypes": [
    {
      "infoType": { "name": "PROJECT_CODE" },
      "dictionary": {
        "wordList": {
          "words": ["PROJ-ALPHA", "PROJ-BETA", "PROJ-GAMMA", "PROJ-DELTA"]
        }
      }
    }
  ]
}
EOF

Step 2: Inspect Content for Sensitive Data

Inspect Text Content

# Inspect a text string
gcloud dlp text inspect \
    --content="Call me at 555-123-4567 or email [email protected]" \
    --info-types="PHONE_NUMBER,EMAIL_ADDRESS" \
    --min-likelihood=POSSIBLE

# Inspect with all common InfoTypes
gcloud dlp text inspect \
    --content="SSN: 123-45-6789, Card: 4111-1111-1111-1111" \
    --info-types="US_SOCIAL_SECURITY_NUMBER,CREDIT_CARD_NUMBER" \
    --include-quote

Inspect Files via API

# Python example - inspect text content
from google.cloud import dlp_v2

def inspect_text(project_id, text):
    dlp = dlp_v2.DlpServiceClient()
    parent = f"projects/{project_id}"

    inspect_config = {
        "info_types": [
            {"name": "EMAIL_ADDRESS"},
            {"name": "PHONE_NUMBER"},
            {"name": "CREDIT_CARD_NUMBER"},
            {"name": "US_SOCIAL_SECURITY_NUMBER"},
        ],
        "min_likelihood": dlp_v2.Likelihood.POSSIBLE,
        "include_quote": True,
    }

    item = {"value": text}
    response = dlp.inspect_content(
        request={"parent": parent, "inspect_config": inspect_config, "item": item}
    )

    for finding in response.result.findings:
        print(f"Info type: {finding.info_type.name}")
        print(f"Likelihood: {finding.likelihood.name}")
        print(f"Quote: {finding.quote}")
        print("---")

inspect_text("my-project", "Contact: [email protected], SSN: 123-45-6789")

Step 3: Create Inspection Jobs for Storage

Scan Cloud Storage Bucket

# Create inspection job template
cat > storage-inspect-template.json << EOF
{
  "displayName": "PII Detection Template",
  "description": "Scans for common PII patterns",
  "inspectConfig": {
    "infoTypes": [
      {"name": "PERSON_NAME"},
      {"name": "EMAIL_ADDRESS"},
      {"name": "PHONE_NUMBER"},
      {"name": "CREDIT_CARD_NUMBER"},
      {"name": "US_SOCIAL_SECURITY_NUMBER"}
    ],
    "minLikelihood": "LIKELY",
    "limits": {
      "maxFindingsPerRequest": 1000
    }
  }
}
EOF

# Create the template
gcloud dlp inspect-templates create \
    --display-name="PII Detection" \
    --description="Scans for PII" \
    --project=PROJECT_ID \
    --template-file=storage-inspect-template.json

# Create inspection job for GCS bucket
gcloud dlp jobs create \
    --project=PROJECT_ID \
    --inspect-template=projects/PROJECT_ID/inspectTemplates/PII_TEMPLATE_ID \
    --storage-config='{"cloudStorageOptions":{"fileSet":{"url":"gs://my-bucket/*"}}}' \
    --actions='[{"saveFindings":{"outputConfig":{"table":{"projectId":"PROJECT_ID","datasetId":"dlp_results","tableId":"findings"}}}}]'

Scan BigQuery Table

# Create BigQuery inspection job
gcloud dlp jobs create \
    --project=PROJECT_ID \
    --inspect-template=projects/PROJECT_ID/inspectTemplates/PII_TEMPLATE_ID \
    --storage-config='{
      "bigQueryOptions": {
        "tableReference": {
          "projectId": "PROJECT_ID",
          "datasetId": "my_dataset",
          "tableId": "customer_data"
        },
        "identifyingFields": [
          {"name": "customer_id"}
        ]
      }
    }' \
    --actions='[{
      "saveFindings": {
        "outputConfig": {
          "table": {
            "projectId": "PROJECT_ID",
            "datasetId": "dlp_results",
            "tableId": "bq_findings"
          }
        }
      }
    }]'

Monitor Job Status

# List DLP jobs
gcloud dlp jobs list --project=PROJECT_ID

# Get job details
gcloud dlp jobs describe JOB_ID --project=PROJECT_ID

# Cancel a running job
gcloud dlp jobs cancel JOB_ID --project=PROJECT_ID

Step 4: Implement De-identification

Masking Transformation

# Mask sensitive data with asterisks
from google.cloud import dlp_v2

def deidentify_with_mask(project_id, text):
    dlp = dlp_v2.DlpServiceClient()
    parent = f"projects/{project_id}"

    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "character_mask_config": {
                            "masking_character": "*",
                            "number_to_mask": 0,  # mask all
                        }
                    }
                }
            ]
        }
    }

    inspect_config = {
        "info_types": [
            {"name": "EMAIL_ADDRESS"},
            {"name": "PHONE_NUMBER"},
        ]
    }

    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "inspect_config": inspect_config,
            "item": {"value": text},
        }
    )

    return response.item.value

# Example: [email protected] -> ****************
result = deidentify_with_mask("my-project", "Email: [email protected]")
print(result)

Redaction

# Remove sensitive data entirely
def deidentify_with_redact(project_id, text):
    dlp = dlp_v2.DlpServiceClient()
    parent = f"projects/{project_id}"

    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "info_types": [{"name": "CREDIT_CARD_NUMBER"}],
                    "primitive_transformation": {
                        "replace_config": {
                            "new_value": {"string_value": "[REDACTED]"}
                        }
                    }
                }
            ]
        }
    }

    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "item": {"value": text},
        }
    )

    return response.item.value

Tokenization (Pseudonymization)

# Replace with reversible tokens using crypto key
def deidentify_with_fpe(project_id, text, key_name):
    dlp = dlp_v2.DlpServiceClient()
    parent = f"projects/{project_id}"

    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "info_types": [{"name": "US_SOCIAL_SECURITY_NUMBER"}],
                    "primitive_transformation": {
                        "crypto_replace_ffx_fpe_config": {
                            "crypto_key": {
                                "kms_wrapped": {
                                    "wrapped_key": "BASE64_WRAPPED_KEY",
                                    "crypto_key_name": key_name,
                                }
                            },
                            "common_alphabet": "NUMERIC",
                        }
                    }
                }
            ]
        }
    }

    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "item": {"value": text},
        }
    )

    return response.item.value

Step 5: Create De-identification Templates

# Create reusable de-identification template
cat > deidentify-template.json << EOF
{
  "displayName": "Standard PII De-identification",
  "description": "Masks PII for analytics use",
  "deidentifyConfig": {
    "infoTypeTransformations": {
      "transformations": [
        {
          "infoTypes": [{"name": "EMAIL_ADDRESS"}],
          "primitiveTransformation": {
            "replaceWithInfoTypeConfig": {}
          }
        },
        {
          "infoTypes": [{"name": "PHONE_NUMBER"}],
          "primitiveTransformation": {
            "characterMaskConfig": {
              "maskingCharacter": "X",
              "numberToMask": 6
            }
          }
        },
        {
          "infoTypes": [{"name": "CREDIT_CARD_NUMBER"}],
          "primitiveTransformation": {
            "replaceConfig": {
              "newValue": {"stringValue": "[CARD REDACTED]"}
            }
          }
        }
      ]
    }
  }
}
EOF

# Create the template
gcloud dlp deidentify-templates create \
    --project=PROJECT_ID \
    --template-file=deidentify-template.json

Step 6: Integrate with BigQuery

Create De-identified BigQuery Table

# Create job to de-identify and copy BigQuery table
gcloud dlp jobs create \
    --project=PROJECT_ID \
    --inspect-template=projects/PROJECT_ID/inspectTemplates/PII_TEMPLATE \
    --deidentify-template=projects/PROJECT_ID/deidentifyTemplates/DEIDENTIFY_TEMPLATE \
    --storage-config='{
      "bigQueryOptions": {
        "tableReference": {
          "projectId": "PROJECT_ID",
          "datasetId": "source_dataset",
          "tableId": "sensitive_data"
        }
      }
    }' \
    --actions='[{
      "deidentify": {
        "cloudStorageOutput": "gs://bucket/deidentified/",
        "transformationDetailsStorageConfig": {
          "table": {
            "projectId": "PROJECT_ID",
            "datasetId": "dlp_results",
            "tableId": "transformation_details"
          }
        }
      }
    }]'

Schedule Recurring Scans

# Create job trigger for scheduled scanning
cat > job-trigger.json << EOF
{
  "displayName": "Weekly PII Scan",
  "description": "Scans customer data weekly for PII",
  "triggers": [
    {
      "schedule": {
        "recurrencePeriodDuration": "604800s"
      }
    }
  ],
  "inspectJob": {
    "storageConfig": {
      "bigQueryOptions": {
        "tableReference": {
          "projectId": "PROJECT_ID",
          "datasetId": "my_dataset",
          "tableId": "customer_data"
        }
      }
    },
    "inspectTemplateName": "projects/PROJECT_ID/inspectTemplates/PII_TEMPLATE",
    "actions": [
      {
        "saveFindings": {
          "outputConfig": {
            "table": {
              "projectId": "PROJECT_ID",
              "datasetId": "dlp_results",
              "tableId": "weekly_findings"
            }
          }
        }
      }
    ]
  }
}
EOF

# Create the trigger
gcloud dlp job-triggers create \
    --project=PROJECT_ID \
    --trigger-file=job-trigger.json

Step 7: Analyze DLP Findings

-- Query DLP findings in BigQuery
SELECT
  info_type.name AS info_type,
  likelihood,
  COUNT(*) AS count,
  COUNT(DISTINCT resource_name) AS affected_resources
FROM `PROJECT_ID.dlp_results.findings`
WHERE DATE(create_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY info_type.name, likelihood
ORDER BY count DESC;

-- Find most sensitive resources
SELECT
  resource_name,
  COUNT(*) AS finding_count,
  STRING_AGG(DISTINCT info_type.name) AS info_types
FROM `PROJECT_ID.dlp_results.findings`
WHERE likelihood IN ('LIKELY', 'VERY_LIKELY')
GROUP BY resource_name
ORDER BY finding_count DESC
LIMIT 20;

Best Practices

Use inspection templates - Standardize detection across scans
Start with sampling - Test on data subsets before full scans
Configure appropriate likelihood - LIKELY or VERY_LIKELY reduces false positives
Implement scheduled scans - Regular monitoring catches new sensitive data
Store findings in BigQuery - Enable analysis and reporting
Use de-identification for analytics - Preserve data utility while protecting privacy
Combine with VPC Service Controls - Prevent data exfiltration
Set up alerts - Notify on high-severity findings

30 Cloud Security Tips for 2026 - Comprehensive cloud security guide
GCP Storage Encryption Guide - Protecting data at rest
GCP Secret Manager Tutorial - Managing sensitive credentials
Cloud DLP Documentation
InfoTypes Reference

Need help implementing data loss prevention? Contact InventiveHQ for expert guidance on sensitive data protection and compliance.

GCP Data Loss Prevention (DLP) API Guide

Prerequisites

Enable Cloud DLP API

Step 1: Understand InfoTypes

Built-in InfoTypes

Create Custom InfoType

Step 2: Inspect Content for Sensitive Data

Inspect Text Content

Inspect Files via API

Step 3: Create Inspection Jobs for Storage

Scan Cloud Storage Bucket

Scan BigQuery Table

Monitor Job Status

Step 4: Implement De-identification

Masking Transformation

Redaction

Tokenization (Pseudonymization)

Step 5: Create De-identification Templates

Step 6: Integrate with BigQuery

Create De-identified BigQuery Table

Schedule Recurring Scans

Step 7: Analyze DLP Findings

Best Practices

Frequently Asked Questions

Expert GCP Management

How to Enable Cloud Audit Logs in GCP

GCP Cloud Armor Setup Guide: WAF, Rate Limiting, and DDoS Protection

GCP Organization Policy Service Guide

Cloud Cost Comparison

Cloud Carbon Footprint

Cloud Security Assessment

GCP Data Loss Prevention (DLP) API Guide

Frequently Asked Questions

Expert GCP Management

Related Articles

How to Enable Cloud Audit Logs in GCP

GCP Cloud Armor Setup Guide: WAF, Rate Limiting, and DDoS Protection

GCP Organization Policy Service Guide

Related Tools

Cloud Cost Comparison

Cloud Carbon Footprint

Cloud Security Assessment