Data Loss Prevention (DLP) is critical for protecting sensitive information across your cloud environment. Google Cloud DLP API provides automated discovery, classification, and protection of sensitive data in Cloud Storage, BigQuery, Datastore, and custom applications.
This guide covers configuring InfoTypes, running inspection jobs, implementing de-identification techniques, and integrating DLP with your data pipeline. For comprehensive cloud security practices, see our 30 Cloud Security Tips for 2026 guide.
Prerequisites
- DLP Administrator role for managing DLP resources
- DLP Jobs Editor role for creating inspection jobs
- Cloud DLP API enabled
- gcloud CLI installed
Enable Cloud DLP API
# Enable the DLP API
gcloud services enable dlp.googleapis.com
# Verify it's enabled
gcloud services list --enabled | grep dlpStep 1: Understand InfoTypes
InfoTypes define the categories of sensitive data DLP can detect:
Built-in InfoTypes
# List all available InfoTypes
gcloud dlp info-types list --filter="supportedBy=INSPECT"
# Common InfoTypes:
# PERSON_NAME - Names
# EMAIL_ADDRESS - Email addresses
# PHONE_NUMBER - Phone numbers
# CREDIT_CARD_NUMBER - Credit card numbers
# US_SOCIAL_SECURITY_NUMBER - SSN
# US_PASSPORT - Passport numbers
# DATE_OF_BIRTH - Birth dates
# IP_ADDRESS - IPv4/IPv6 addresses
# GCP_CREDENTIALS - API keys, service account keysCreate Custom InfoType
# Create custom InfoType for employee IDs (e.g., EMP-12345)
cat > custom-infotype.json << EOF
{
"customInfoTypes": [
{
"infoType": { "name": "EMPLOYEE_ID" },
"regex": { "pattern": "EMP-[0-9]{5}" },
"likelihood": "LIKELY"
}
]
}
EOF
# Create dictionary-based InfoType for internal project codes
cat > dictionary-infotype.json << EOF
{
"customInfoTypes": [
{
"infoType": { "name": "PROJECT_CODE" },
"dictionary": {
"wordList": {
"words": ["PROJ-ALPHA", "PROJ-BETA", "PROJ-GAMMA", "PROJ-DELTA"]
}
}
}
]
}
EOFStep 2: Inspect Content for Sensitive Data
Inspect Text Content
# Inspect a text string
gcloud dlp text inspect \
--content="Call me at 555-123-4567 or email [email protected]" \
--info-types="PHONE_NUMBER,EMAIL_ADDRESS" \
--min-likelihood=POSSIBLE
# Inspect with all common InfoTypes
gcloud dlp text inspect \
--content="SSN: 123-45-6789, Card: 4111-1111-1111-1111" \
--info-types="US_SOCIAL_SECURITY_NUMBER,CREDIT_CARD_NUMBER" \
--include-quoteInspect Files via API
# Python example - inspect text content
from google.cloud import dlp_v2
def inspect_text(project_id, text):
dlp = dlp_v2.DlpServiceClient()
parent = f"projects/{project_id}"
inspect_config = {
"info_types": [
{"name": "EMAIL_ADDRESS"},
{"name": "PHONE_NUMBER"},
{"name": "CREDIT_CARD_NUMBER"},
{"name": "US_SOCIAL_SECURITY_NUMBER"},
],
"min_likelihood": dlp_v2.Likelihood.POSSIBLE,
"include_quote": True,
}
item = {"value": text}
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
)
for finding in response.result.findings:
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood.name}")
print(f"Quote: {finding.quote}")
print("---")
inspect_text("my-project", "Contact: [email protected], SSN: 123-45-6789")Step 3: Create Inspection Jobs for Storage
Scan Cloud Storage Bucket
# Create inspection job template
cat > storage-inspect-template.json << EOF
{
"displayName": "PII Detection Template",
"description": "Scans for common PII patterns",
"inspectConfig": {
"infoTypes": [
{"name": "PERSON_NAME"},
{"name": "EMAIL_ADDRESS"},
{"name": "PHONE_NUMBER"},
{"name": "CREDIT_CARD_NUMBER"},
{"name": "US_SOCIAL_SECURITY_NUMBER"}
],
"minLikelihood": "LIKELY",
"limits": {
"maxFindingsPerRequest": 1000
}
}
}
EOF
# Create the template
gcloud dlp inspect-templates create \
--display-name="PII Detection" \
--description="Scans for PII" \
--project=PROJECT_ID \
--template-file=storage-inspect-template.json
# Create inspection job for GCS bucket
gcloud dlp jobs create \
--project=PROJECT_ID \
--inspect-template=projects/PROJECT_ID/inspectTemplates/PII_TEMPLATE_ID \
--storage-config='{"cloudStorageOptions":{"fileSet":{"url":"gs://my-bucket/*"}}}' \
--actions='[{"saveFindings":{"outputConfig":{"table":{"projectId":"PROJECT_ID","datasetId":"dlp_results","tableId":"findings"}}}}]'Scan BigQuery Table
# Create BigQuery inspection job
gcloud dlp jobs create \
--project=PROJECT_ID \
--inspect-template=projects/PROJECT_ID/inspectTemplates/PII_TEMPLATE_ID \
--storage-config='{
"bigQueryOptions": {
"tableReference": {
"projectId": "PROJECT_ID",
"datasetId": "my_dataset",
"tableId": "customer_data"
},
"identifyingFields": [
{"name": "customer_id"}
]
}
}' \
--actions='[{
"saveFindings": {
"outputConfig": {
"table": {
"projectId": "PROJECT_ID",
"datasetId": "dlp_results",
"tableId": "bq_findings"
}
}
}
}]'Monitor Job Status
# List DLP jobs
gcloud dlp jobs list --project=PROJECT_ID
# Get job details
gcloud dlp jobs describe JOB_ID --project=PROJECT_ID
# Cancel a running job
gcloud dlp jobs cancel JOB_ID --project=PROJECT_IDStep 4: Implement De-identification
Masking Transformation
# Mask sensitive data with asterisks
from google.cloud import dlp_v2
def deidentify_with_mask(project_id, text):
dlp = dlp_v2.DlpServiceClient()
parent = f"projects/{project_id}"
deidentify_config = {
"info_type_transformations": {
"transformations": [
{
"primitive_transformation": {
"character_mask_config": {
"masking_character": "*",
"number_to_mask": 0, # mask all
}
}
}
]
}
}
inspect_config = {
"info_types": [
{"name": "EMAIL_ADDRESS"},
{"name": "PHONE_NUMBER"},
]
}
response = dlp.deidentify_content(
request={
"parent": parent,
"deidentify_config": deidentify_config,
"inspect_config": inspect_config,
"item": {"value": text},
}
)
return response.item.value
# Example: [email protected] -> ****************
result = deidentify_with_mask("my-project", "Email: [email protected]")
print(result)Redaction
# Remove sensitive data entirely
def deidentify_with_redact(project_id, text):
dlp = dlp_v2.DlpServiceClient()
parent = f"projects/{project_id}"
deidentify_config = {
"info_type_transformations": {
"transformations": [
{
"info_types": [{"name": "CREDIT_CARD_NUMBER"}],
"primitive_transformation": {
"replace_config": {
"new_value": {"string_value": "[REDACTED]"}
}
}
}
]
}
}
response = dlp.deidentify_content(
request={
"parent": parent,
"deidentify_config": deidentify_config,
"item": {"value": text},
}
)
return response.item.valueTokenization (Pseudonymization)
# Replace with reversible tokens using crypto key
def deidentify_with_fpe(project_id, text, key_name):
dlp = dlp_v2.DlpServiceClient()
parent = f"projects/{project_id}"
deidentify_config = {
"info_type_transformations": {
"transformations": [
{
"info_types": [{"name": "US_SOCIAL_SECURITY_NUMBER"}],
"primitive_transformation": {
"crypto_replace_ffx_fpe_config": {
"crypto_key": {
"kms_wrapped": {
"wrapped_key": "BASE64_WRAPPED_KEY",
"crypto_key_name": key_name,
}
},
"common_alphabet": "NUMERIC",
}
}
}
]
}
}
response = dlp.deidentify_content(
request={
"parent": parent,
"deidentify_config": deidentify_config,
"item": {"value": text},
}
)
return response.item.valueStep 5: Create De-identification Templates
# Create reusable de-identification template
cat > deidentify-template.json << EOF
{
"displayName": "Standard PII De-identification",
"description": "Masks PII for analytics use",
"deidentifyConfig": {
"infoTypeTransformations": {
"transformations": [
{
"infoTypes": [{"name": "EMAIL_ADDRESS"}],
"primitiveTransformation": {
"replaceWithInfoTypeConfig": {}
}
},
{
"infoTypes": [{"name": "PHONE_NUMBER"}],
"primitiveTransformation": {
"characterMaskConfig": {
"maskingCharacter": "X",
"numberToMask": 6
}
}
},
{
"infoTypes": [{"name": "CREDIT_CARD_NUMBER"}],
"primitiveTransformation": {
"replaceConfig": {
"newValue": {"stringValue": "[CARD REDACTED]"}
}
}
}
]
}
}
}
EOF
# Create the template
gcloud dlp deidentify-templates create \
--project=PROJECT_ID \
--template-file=deidentify-template.jsonStep 6: Integrate with BigQuery
Create De-identified BigQuery Table
# Create job to de-identify and copy BigQuery table
gcloud dlp jobs create \
--project=PROJECT_ID \
--inspect-template=projects/PROJECT_ID/inspectTemplates/PII_TEMPLATE \
--deidentify-template=projects/PROJECT_ID/deidentifyTemplates/DEIDENTIFY_TEMPLATE \
--storage-config='{
"bigQueryOptions": {
"tableReference": {
"projectId": "PROJECT_ID",
"datasetId": "source_dataset",
"tableId": "sensitive_data"
}
}
}' \
--actions='[{
"deidentify": {
"cloudStorageOutput": "gs://bucket/deidentified/",
"transformationDetailsStorageConfig": {
"table": {
"projectId": "PROJECT_ID",
"datasetId": "dlp_results",
"tableId": "transformation_details"
}
}
}
}]'Schedule Recurring Scans
# Create job trigger for scheduled scanning
cat > job-trigger.json << EOF
{
"displayName": "Weekly PII Scan",
"description": "Scans customer data weekly for PII",
"triggers": [
{
"schedule": {
"recurrencePeriodDuration": "604800s"
}
}
],
"inspectJob": {
"storageConfig": {
"bigQueryOptions": {
"tableReference": {
"projectId": "PROJECT_ID",
"datasetId": "my_dataset",
"tableId": "customer_data"
}
}
},
"inspectTemplateName": "projects/PROJECT_ID/inspectTemplates/PII_TEMPLATE",
"actions": [
{
"saveFindings": {
"outputConfig": {
"table": {
"projectId": "PROJECT_ID",
"datasetId": "dlp_results",
"tableId": "weekly_findings"
}
}
}
}
]
}
}
EOF
# Create the trigger
gcloud dlp job-triggers create \
--project=PROJECT_ID \
--trigger-file=job-trigger.jsonStep 7: Analyze DLP Findings
-- Query DLP findings in BigQuery
SELECT
info_type.name AS info_type,
likelihood,
COUNT(*) AS count,
COUNT(DISTINCT resource_name) AS affected_resources
FROM `PROJECT_ID.dlp_results.findings`
WHERE DATE(create_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY info_type.name, likelihood
ORDER BY count DESC;
-- Find most sensitive resources
SELECT
resource_name,
COUNT(*) AS finding_count,
STRING_AGG(DISTINCT info_type.name) AS info_types
FROM `PROJECT_ID.dlp_results.findings`
WHERE likelihood IN ('LIKELY', 'VERY_LIKELY')
GROUP BY resource_name
ORDER BY finding_count DESC
LIMIT 20;Best Practices
- Use inspection templates - Standardize detection across scans
- Start with sampling - Test on data subsets before full scans
- Configure appropriate likelihood - LIKELY or VERY_LIKELY reduces false positives
- Implement scheduled scans - Regular monitoring catches new sensitive data
- Store findings in BigQuery - Enable analysis and reporting
- Use de-identification for analytics - Preserve data utility while protecting privacy
- Combine with VPC Service Controls - Prevent data exfiltration
- Set up alerts - Notify on high-severity findings
Related Resources
- 30 Cloud Security Tips for 2026 - Comprehensive cloud security guide
- GCP Storage Encryption Guide - Protecting data at rest
- GCP Secret Manager Tutorial - Managing sensitive credentials
- Cloud DLP Documentation
- InfoTypes Reference
Need help implementing data loss prevention? Contact InventiveHQ for expert guidance on sensitive data protection and compliance.