Google Cloudintermediate

GCP Data Loss Prevention (DLP) API Guide

Discover and protect sensitive data with Google Cloud DLP API. Learn to configure InfoTypes, run inspection jobs, implement de-identification, and integrate with BigQuery.

11 min readUpdated 2026-01-14

Data Loss Prevention (DLP) is critical for protecting sensitive information across your cloud environment. Google Cloud DLP API provides automated discovery, classification, and protection of sensitive data in Cloud Storage, BigQuery, Datastore, and custom applications.

This guide covers configuring InfoTypes, running inspection jobs, implementing de-identification techniques, and integrating DLP with your data pipeline. For comprehensive cloud security practices, see our 30 Cloud Security Tips for 2026 guide.

Prerequisites

  • DLP Administrator role for managing DLP resources
  • DLP Jobs Editor role for creating inspection jobs
  • Cloud DLP API enabled
  • gcloud CLI installed

Enable Cloud DLP API

# Enable the DLP API
gcloud services enable dlp.googleapis.com

# Verify it's enabled
gcloud services list --enabled | grep dlp

Step 1: Understand InfoTypes

InfoTypes define the categories of sensitive data DLP can detect:

Built-in InfoTypes

# List all available InfoTypes
gcloud dlp info-types list --filter="supportedBy=INSPECT"

# Common InfoTypes:
# PERSON_NAME - Names
# EMAIL_ADDRESS - Email addresses
# PHONE_NUMBER - Phone numbers
# CREDIT_CARD_NUMBER - Credit card numbers
# US_SOCIAL_SECURITY_NUMBER - SSN
# US_PASSPORT - Passport numbers
# DATE_OF_BIRTH - Birth dates
# IP_ADDRESS - IPv4/IPv6 addresses
# GCP_CREDENTIALS - API keys, service account keys

Create Custom InfoType

# Create custom InfoType for employee IDs (e.g., EMP-12345)
cat > custom-infotype.json << EOF
{
  "customInfoTypes": [
    {
      "infoType": { "name": "EMPLOYEE_ID" },
      "regex": { "pattern": "EMP-[0-9]{5}" },
      "likelihood": "LIKELY"
    }
  ]
}
EOF

# Create dictionary-based InfoType for internal project codes
cat > dictionary-infotype.json << EOF
{
  "customInfoTypes": [
    {
      "infoType": { "name": "PROJECT_CODE" },
      "dictionary": {
        "wordList": {
          "words": ["PROJ-ALPHA", "PROJ-BETA", "PROJ-GAMMA", "PROJ-DELTA"]
        }
      }
    }
  ]
}
EOF

Step 2: Inspect Content for Sensitive Data

Inspect Text Content

# Inspect a text string
gcloud dlp text inspect \
    --content="Call me at 555-123-4567 or email [email protected]" \
    --info-types="PHONE_NUMBER,EMAIL_ADDRESS" \
    --min-likelihood=POSSIBLE

# Inspect with all common InfoTypes
gcloud dlp text inspect \
    --content="SSN: 123-45-6789, Card: 4111-1111-1111-1111" \
    --info-types="US_SOCIAL_SECURITY_NUMBER,CREDIT_CARD_NUMBER" \
    --include-quote

Inspect Files via API

# Python example - inspect text content
from google.cloud import dlp_v2

def inspect_text(project_id, text):
    dlp = dlp_v2.DlpServiceClient()
    parent = f"projects/{project_id}"

    inspect_config = {
        "info_types": [
            {"name": "EMAIL_ADDRESS"},
            {"name": "PHONE_NUMBER"},
            {"name": "CREDIT_CARD_NUMBER"},
            {"name": "US_SOCIAL_SECURITY_NUMBER"},
        ],
        "min_likelihood": dlp_v2.Likelihood.POSSIBLE,
        "include_quote": True,
    }

    item = {"value": text}
    response = dlp.inspect_content(
        request={"parent": parent, "inspect_config": inspect_config, "item": item}
    )

    for finding in response.result.findings:
        print(f"Info type: {finding.info_type.name}")
        print(f"Likelihood: {finding.likelihood.name}")
        print(f"Quote: {finding.quote}")
        print("---")

inspect_text("my-project", "Contact: [email protected], SSN: 123-45-6789")

Step 3: Create Inspection Jobs for Storage

Scan Cloud Storage Bucket

# Create inspection job template
cat > storage-inspect-template.json << EOF
{
  "displayName": "PII Detection Template",
  "description": "Scans for common PII patterns",
  "inspectConfig": {
    "infoTypes": [
      {"name": "PERSON_NAME"},
      {"name": "EMAIL_ADDRESS"},
      {"name": "PHONE_NUMBER"},
      {"name": "CREDIT_CARD_NUMBER"},
      {"name": "US_SOCIAL_SECURITY_NUMBER"}
    ],
    "minLikelihood": "LIKELY",
    "limits": {
      "maxFindingsPerRequest": 1000
    }
  }
}
EOF

# Create the template
gcloud dlp inspect-templates create \
    --display-name="PII Detection" \
    --description="Scans for PII" \
    --project=PROJECT_ID \
    --template-file=storage-inspect-template.json

# Create inspection job for GCS bucket
gcloud dlp jobs create \
    --project=PROJECT_ID \
    --inspect-template=projects/PROJECT_ID/inspectTemplates/PII_TEMPLATE_ID \
    --storage-config='{"cloudStorageOptions":{"fileSet":{"url":"gs://my-bucket/*"}}}' \
    --actions='[{"saveFindings":{"outputConfig":{"table":{"projectId":"PROJECT_ID","datasetId":"dlp_results","tableId":"findings"}}}}]'

Scan BigQuery Table

# Create BigQuery inspection job
gcloud dlp jobs create \
    --project=PROJECT_ID \
    --inspect-template=projects/PROJECT_ID/inspectTemplates/PII_TEMPLATE_ID \
    --storage-config='{
      "bigQueryOptions": {
        "tableReference": {
          "projectId": "PROJECT_ID",
          "datasetId": "my_dataset",
          "tableId": "customer_data"
        },
        "identifyingFields": [
          {"name": "customer_id"}
        ]
      }
    }' \
    --actions='[{
      "saveFindings": {
        "outputConfig": {
          "table": {
            "projectId": "PROJECT_ID",
            "datasetId": "dlp_results",
            "tableId": "bq_findings"
          }
        }
      }
    }]'

Monitor Job Status

# List DLP jobs
gcloud dlp jobs list --project=PROJECT_ID

# Get job details
gcloud dlp jobs describe JOB_ID --project=PROJECT_ID

# Cancel a running job
gcloud dlp jobs cancel JOB_ID --project=PROJECT_ID

Step 4: Implement De-identification

Masking Transformation

# Mask sensitive data with asterisks
from google.cloud import dlp_v2

def deidentify_with_mask(project_id, text):
    dlp = dlp_v2.DlpServiceClient()
    parent = f"projects/{project_id}"

    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "character_mask_config": {
                            "masking_character": "*",
                            "number_to_mask": 0,  # mask all
                        }
                    }
                }
            ]
        }
    }

    inspect_config = {
        "info_types": [
            {"name": "EMAIL_ADDRESS"},
            {"name": "PHONE_NUMBER"},
        ]
    }

    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "inspect_config": inspect_config,
            "item": {"value": text},
        }
    )

    return response.item.value

# Example: [email protected] -> ****************
result = deidentify_with_mask("my-project", "Email: [email protected]")
print(result)

Redaction

# Remove sensitive data entirely
def deidentify_with_redact(project_id, text):
    dlp = dlp_v2.DlpServiceClient()
    parent = f"projects/{project_id}"

    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "info_types": [{"name": "CREDIT_CARD_NUMBER"}],
                    "primitive_transformation": {
                        "replace_config": {
                            "new_value": {"string_value": "[REDACTED]"}
                        }
                    }
                }
            ]
        }
    }

    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "item": {"value": text},
        }
    )

    return response.item.value

Tokenization (Pseudonymization)

# Replace with reversible tokens using crypto key
def deidentify_with_fpe(project_id, text, key_name):
    dlp = dlp_v2.DlpServiceClient()
    parent = f"projects/{project_id}"

    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "info_types": [{"name": "US_SOCIAL_SECURITY_NUMBER"}],
                    "primitive_transformation": {
                        "crypto_replace_ffx_fpe_config": {
                            "crypto_key": {
                                "kms_wrapped": {
                                    "wrapped_key": "BASE64_WRAPPED_KEY",
                                    "crypto_key_name": key_name,
                                }
                            },
                            "common_alphabet": "NUMERIC",
                        }
                    }
                }
            ]
        }
    }

    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "item": {"value": text},
        }
    )

    return response.item.value

Step 5: Create De-identification Templates

# Create reusable de-identification template
cat > deidentify-template.json << EOF
{
  "displayName": "Standard PII De-identification",
  "description": "Masks PII for analytics use",
  "deidentifyConfig": {
    "infoTypeTransformations": {
      "transformations": [
        {
          "infoTypes": [{"name": "EMAIL_ADDRESS"}],
          "primitiveTransformation": {
            "replaceWithInfoTypeConfig": {}
          }
        },
        {
          "infoTypes": [{"name": "PHONE_NUMBER"}],
          "primitiveTransformation": {
            "characterMaskConfig": {
              "maskingCharacter": "X",
              "numberToMask": 6
            }
          }
        },
        {
          "infoTypes": [{"name": "CREDIT_CARD_NUMBER"}],
          "primitiveTransformation": {
            "replaceConfig": {
              "newValue": {"stringValue": "[CARD REDACTED]"}
            }
          }
        }
      ]
    }
  }
}
EOF

# Create the template
gcloud dlp deidentify-templates create \
    --project=PROJECT_ID \
    --template-file=deidentify-template.json

Step 6: Integrate with BigQuery

Create De-identified BigQuery Table

# Create job to de-identify and copy BigQuery table
gcloud dlp jobs create \
    --project=PROJECT_ID \
    --inspect-template=projects/PROJECT_ID/inspectTemplates/PII_TEMPLATE \
    --deidentify-template=projects/PROJECT_ID/deidentifyTemplates/DEIDENTIFY_TEMPLATE \
    --storage-config='{
      "bigQueryOptions": {
        "tableReference": {
          "projectId": "PROJECT_ID",
          "datasetId": "source_dataset",
          "tableId": "sensitive_data"
        }
      }
    }' \
    --actions='[{
      "deidentify": {
        "cloudStorageOutput": "gs://bucket/deidentified/",
        "transformationDetailsStorageConfig": {
          "table": {
            "projectId": "PROJECT_ID",
            "datasetId": "dlp_results",
            "tableId": "transformation_details"
          }
        }
      }
    }]'

Schedule Recurring Scans

# Create job trigger for scheduled scanning
cat > job-trigger.json << EOF
{
  "displayName": "Weekly PII Scan",
  "description": "Scans customer data weekly for PII",
  "triggers": [
    {
      "schedule": {
        "recurrencePeriodDuration": "604800s"
      }
    }
  ],
  "inspectJob": {
    "storageConfig": {
      "bigQueryOptions": {
        "tableReference": {
          "projectId": "PROJECT_ID",
          "datasetId": "my_dataset",
          "tableId": "customer_data"
        }
      }
    },
    "inspectTemplateName": "projects/PROJECT_ID/inspectTemplates/PII_TEMPLATE",
    "actions": [
      {
        "saveFindings": {
          "outputConfig": {
            "table": {
              "projectId": "PROJECT_ID",
              "datasetId": "dlp_results",
              "tableId": "weekly_findings"
            }
          }
        }
      }
    ]
  }
}
EOF

# Create the trigger
gcloud dlp job-triggers create \
    --project=PROJECT_ID \
    --trigger-file=job-trigger.json

Step 7: Analyze DLP Findings

-- Query DLP findings in BigQuery
SELECT
  info_type.name AS info_type,
  likelihood,
  COUNT(*) AS count,
  COUNT(DISTINCT resource_name) AS affected_resources
FROM `PROJECT_ID.dlp_results.findings`
WHERE DATE(create_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY info_type.name, likelihood
ORDER BY count DESC;

-- Find most sensitive resources
SELECT
  resource_name,
  COUNT(*) AS finding_count,
  STRING_AGG(DISTINCT info_type.name) AS info_types
FROM `PROJECT_ID.dlp_results.findings`
WHERE likelihood IN ('LIKELY', 'VERY_LIKELY')
GROUP BY resource_name
ORDER BY finding_count DESC
LIMIT 20;

Best Practices

  • Use inspection templates - Standardize detection across scans
  • Start with sampling - Test on data subsets before full scans
  • Configure appropriate likelihood - LIKELY or VERY_LIKELY reduces false positives
  • Implement scheduled scans - Regular monitoring catches new sensitive data
  • Store findings in BigQuery - Enable analysis and reporting
  • Use de-identification for analytics - Preserve data utility while protecting privacy
  • Combine with VPC Service Controls - Prevent data exfiltration
  • Set up alerts - Notify on high-severity findings

Need help implementing data loss prevention? Contact InventiveHQ for expert guidance on sensitive data protection and compliance.

Frequently Asked Questions

Find answers to common questions

Cloud DLP can detect over 150 built-in InfoTypes including personally identifiable information (SSN, passport numbers, driver's licenses), financial data (credit card numbers, bank accounts, IBANs), healthcare data (medical record numbers, DEA numbers), credentials (API keys, passwords, private keys), and custom patterns you define with regular expressions or dictionaries. It supports detection in text, images, and structured data.

Expert GCP Management

From architecture design to managed operations, we handle your Google Cloud infrastructure.