Microsoft Azureintermediate

Microsoft Purview Data Discovery and Protection Guide

Complete guide to discovering and classifying sensitive data with Microsoft Purview, including data discovery, classification labels, sensitivity labels, and DLP policies.

11 min readUpdated 2026-01-14

Microsoft Purview provides comprehensive data governance and protection capabilities, helping organizations discover, classify, and protect sensitive data across their entire data estate. This guide covers setting up data discovery, applying classification labels, configuring sensitivity labels, and implementing DLP policies.

This article is part of our comprehensive guide on Cloud Security Tips for 2026, which covers essential security practices across all major cloud platforms.

Overview

Microsoft Purview encompasses:

  • Data Map: Data cataloging and governance across multi-cloud environments
  • Data Catalog: Business-friendly data discovery and understanding
  • Information Protection: Sensitivity labels and encryption
  • Data Loss Prevention: Prevent unauthorized data sharing
  • Compliance Manager: Compliance assessment and management

Prerequisites

Before configuring Purview, ensure you have:

  • Microsoft 365 E5 or Purview licenses for full capabilities
  • Global Administrator or Compliance Administrator role
  • Azure subscription for Purview Data Map (governance features)
  • Understanding of data classification requirements
  • Inventory of data sources to scan and protect

Step 1: Set Up Microsoft Purview Data Map

Create Purview Account via Portal

  1. Sign in to the Azure Portal
  2. Search for Microsoft Purview and select Microsoft Purview accounts
  3. Click + Create
  4. Configure:
    • Subscription: Select your subscription
    • Resource group: Create or select existing
    • Account name: purview-contoso-prod
    • Location: Select region
    • Managed resource group: Accept default or customize
  5. Configure networking (private endpoint recommended for production)
  6. Click Review + create, then Create

Create Purview Account via Azure CLI

# Create resource group
az group create \
  --name "rg-purview" \
  --location "eastus"

# Create Purview account
az purview account create \
  --name "purview-contoso-prod" \
  --resource-group "rg-purview" \
  --location "eastus" \
  --public-network-access "Enabled"

# Get Purview account details
az purview account show \
  --name "purview-contoso-prod" \
  --resource-group "rg-purview" \
  --query "{Name:name, Endpoint:endpoints.catalog, Status:provisioningState}" \
  -o table

# Open Purview Studio
PURVIEW_ENDPOINT=$(az purview account show \
  --name "purview-contoso-prod" \
  --resource-group "rg-purview" \
  --query "endpoints.catalog" -o tsv)
echo "Access Purview Studio at: $PURVIEW_ENDPOINT"

Configure Access Control

# Get current user's object ID
USER_ID=$(az ad signed-in-user show --query id -o tsv)

# Assign Collection Admin role
az purview account add-root-collection-admin \
  --name "purview-contoso-prod" \
  --resource-group "rg-purview" \
  --object-id "$USER_ID"

# Assign Data Source Administrator role (via Purview Studio or API)
# This allows registering and scanning data sources

Step 2: Register and Scan Data Sources

Register Azure Data Lake via Portal

  1. Open Purview Studio (governance.microsoft.com)
  2. Go to Data Map > Sources
  3. Click Register
  4. Select Azure Data Lake Storage Gen2
  5. Configure:
    • Storage account: Select from subscription
    • Collection: Select or create
  6. Click Register

Register Data Sources via PowerShell

# Install Purview PowerShell module
Install-Module -Name Az.Purview -Force

# Register Azure SQL Database
$purviewAccount = "purview-contoso-prod"
$resourceGroup = "rg-purview"

# Get Purview account
$account = Get-AzPurviewAccount -Name $purviewAccount -ResourceGroupName $resourceGroup

# Register SQL Server source
$sqlServer = @{
    kind = "AzureSqlDatabase"
    properties = @{
        serverEndpoint = "sql-contoso-prod.database.windows.net"
        subscriptionId = (Get-AzContext).Subscription.Id
        resourceGroup = "rg-databases"
        location = "eastus"
    }
}

# Use REST API or Purview Studio for source registration

Create Scan Definition

  1. In Purview Studio, navigate to your registered source
  2. Click New scan
  3. Configure scan:
    • Name: scan-weekly-adls
    • Integration runtime: Azure or self-hosted
    • Credential: Managed identity (recommended)
  4. Select scan rule set:
    • System default: Standard classification rules
    • Custom: Organization-specific patterns
  5. Configure schedule:
    • Recurrence: Weekly
    • Start time: Off-peak hours
  6. Click Save and run

Create Custom Classification Rules

{
  "name": "EmployeeID",
  "kind": "Custom",
  "properties": {
    "classificationName": "EMPLOYEE_ID",
    "classificationDescription": "Company Employee Identifier",
    "ruleStatus": "Enabled",
    "dataPattern": {
      "kind": "Regex",
      "pattern": "EMP[0-9]{6}"
    },
    "columnPattern": {
      "kind": "Regex",
      "pattern": "(?i)(employee.*id|emp.*id|worker.*id)"
    },
    "minimumPercentageMatch": 60
  }
}

Step 3: Configure Sensitivity Labels

Access Microsoft Purview Compliance Portal

  1. Navigate to Microsoft Purview compliance portal
  2. Go to Information protection > Labels
  3. Review existing labels or create new ones

Create Sensitivity Labels

  1. Click + Create a label
  2. Configure basics:
    • Name: Confidential - Internal
    • Display name: Confidential - Internal Only
    • Description for users: For internal use within the organization
  3. Configure scope:
    • Items (files and emails)
    • Groups & sites (optional)
    • Schematized data assets (optional)
  4. Configure protection settings:
    • Encryption: Configure or leave unencrypted
    • Content marking: Headers, footers, watermarks
    • Auto-labeling: Define conditions
  5. Click Create

Configure Label Encryption

# Connect to Security & Compliance PowerShell
Connect-IPPSSession -UserPrincipalName [email protected]

# Create encryption-enabled sensitivity label
New-Label -Name "Highly Confidential" `
  -DisplayName "Highly Confidential - Encrypted" `
  -Comment "For highly sensitive data requiring encryption" `
  -Tooltip "Apply this label to encrypt sensitive documents" `
  -EncryptionEnabled $true `
  -EncryptionProtectionType "Template" `
  -EncryptionRightsDefinitions "[email protected]:VIEW,EDIT,PRINT" `
  -EncryptionOfflineAccessDays 30

# Configure content marking
Set-Label -Identity "Highly Confidential" `
  -ApplyContentMarkingHeaderEnabled $true `
  -ApplyContentMarkingHeaderText "HIGHLY CONFIDENTIAL" `
  -ApplyContentMarkingHeaderFontSize 12 `
  -ApplyContentMarkingHeaderFontColor "#FF0000" `
  -ApplyContentMarkingFooterEnabled $true `
  -ApplyContentMarkingFooterText "Internal Use Only - Do Not Distribute"

Create Label Policy

# Create label policy to publish labels
New-LabelPolicy -Name "Standard Label Policy" `
  -Labels "Public", "Internal", "Confidential", "Highly Confidential" `
  -ExchangeLocation "All" `
  -SharePointLocation "All" `
  -OneDriveLocation "All" `
  -Settings @{
    mandatory = $true
    requiredowngradejustification = $true
  }

# Configure default label for new documents
Set-LabelPolicy -Identity "Standard Label Policy" `
  -AdvancedSettings @{
    DefaultLabelId = "00000000-0000-0000-0000-000000000001"
    OutlookDefaultLabel = "00000000-0000-0000-0000-000000000002"
  }

Step 4: Configure Auto-Labeling

Create Auto-Labeling Policy via Portal

  1. In compliance portal, go to Information protection > Auto-labeling
  2. Click + Create auto-labeling policy
  3. Configure:
    • Name: Auto-label-PII
    • Description: Automatically label documents containing PII
  4. Select label to apply
  5. Configure conditions:
    • Content contains: Credit card numbers, SSN, etc.
    • Content is shared: External users
  6. Choose locations:
    • Exchange email
    • SharePoint sites
    • OneDrive accounts
  7. Review and create

Configure Auto-Labeling via PowerShell

# Create auto-labeling policy for credit card detection
New-AutoSensitivityLabelPolicy -Name "Auto-Label-Credit-Cards" `
  -ExchangeLocation "All" `
  -SharePointLocation "All" `
  -OneDriveLocation "All" `
  -Mode "TestWithoutNotifications"

# Create auto-labeling rule
New-AutoSensitivityLabelRule -Policy "Auto-Label-Credit-Cards" `
  -Name "Credit Card Rule" `
  -ContentContainsSensitiveInformation @{
    Name = "Credit Card Number"
    minCount = 1
  } `
  -ApplySensitivityLabel "Confidential"

# Enable policy after testing
Set-AutoSensitivityLabelPolicy -Identity "Auto-Label-Credit-Cards" `
  -Mode "Enable"

Monitor Auto-Labeling Results

# Get auto-labeling policy status
Get-AutoSensitivityLabelPolicy | Format-Table Name, Mode, WhenCreated

# View simulation results (when in test mode)
# Navigate to: Compliance portal > Information protection > Auto-labeling > Policy > View simulation

Step 5: Implement Data Loss Prevention

Create DLP Policy via Portal

  1. In compliance portal, go to Data loss prevention > Policies
  2. Click + Create policy
  3. Choose template or custom:
    • Financial: Credit card, bank account numbers
    • Medical: Health records, medical terms
    • Privacy: PII, personal data
    • Custom: Organization-specific rules
  4. Configure policy:
    • Name: DLP-Protect-Financial-Data
    • Locations: Exchange, SharePoint, OneDrive, Teams, Devices
  5. Configure rules:
    • Conditions: Content contains sensitive info
    • Actions: Block, notify, require justification
  6. Set alerts and reports
  7. Enable policy

Create DLP Policy via PowerShell

# Create DLP policy for financial data protection
New-DlpCompliancePolicy -Name "Protect-Financial-Data" `
  -ExchangeLocation "All" `
  -SharePointLocation "All" `
  -OneDriveLocation "All" `
  -TeamsLocation "All" `
  -Mode "Enable" `
  -Comment "Protects credit card and financial information"

# Create detection rule
New-DlpComplianceRule -Policy "Protect-Financial-Data" `
  -Name "Block-Credit-Card-External" `
  -ContentContainsSensitiveInformation @{
    Name = "Credit Card Number"
    minCount = 1
  } `
  -AccessScope "NotInOrganization" `
  -BlockAccess $true `
  -NotifyUser "SiteAdmin", "LastModifier" `
  -NotifyUserType "Sender" `
  -GenerateIncidentReport "SiteAdmin"

# Create rule for internal warnings
New-DlpComplianceRule -Policy "Protect-Financial-Data" `
  -Name "Warn-Credit-Card-Internal" `
  -ContentContainsSensitiveInformation @{
    Name = "Credit Card Number"
    minCount = 1
  } `
  -AccessScope "InOrganization" `
  -NotifyUser "LastModifier" `
  -NotifyPolicyTipDisplayOption "Dialog"

Configure DLP for Endpoint Devices

# Enable endpoint DLP locations
Set-DlpCompliancePolicy -Identity "Protect-Financial-Data" `
  -EndpointDlpLocation "All"

# Configure endpoint actions
New-DlpComplianceRule -Policy "Protect-Financial-Data" `
  -Name "Endpoint-Block-USB-Copy" `
  -ContentContainsSensitiveInformation @{
    Name = "Credit Card Number"
    minCount = 1
  } `
  -EndpointDlpRestrictions @{
    "CopyToRemovableMedia" = "Block"
    "CopyToNetworkShare" = "Warn"
    "Print" = "Warn"
    "CopyToClipboard" = "Audit"
  }

Step 6: Monitor and Report

View DLP Reports

  1. In compliance portal, go to Reports > DLP
  2. Review dashboards:
    • Policy matches over time
    • Top matched policies
    • Top matched locations
    • False positive ratio

Query Activity via PowerShell

# Search for DLP policy matches
Search-UnifiedAuditLog -StartDate (Get-Date).AddDays(-7) `
  -EndDate (Get-Date) `
  -RecordType "DLP" `
  -ResultSize 1000 | Select-Object -ExpandProperty AuditData | ConvertFrom-Json

# Get sensitivity label usage
Get-LabelActivity | Group-Object LabelName | Select-Object Name, Count

# Export DLP incidents
$incidents = Search-UnifiedAuditLog -StartDate (Get-Date).AddDays(-30) `
  -EndDate (Get-Date) `
  -RecordType "DLP"
$incidents | Export-Csv -Path "DLP-Incidents.csv" -NoTypeInformation

Set Up Alerts

# Create alert policy for DLP violations
New-ActivityAlert -Name "High Volume DLP Violations" `
  -Category "DataLossPrevention" `
  -Operation "DLPRuleMatch" `
  -Threshold 50 `
  -TimeWindow 60 `
  -NotifyUser "[email protected]" `
  -Severity "High"

Best Practices

  1. Start with discovery: Scan data sources before defining labels
  2. Use built-in classifiers: Leverage Microsoft's trained classifiers
  3. Test before enforcement: Run policies in simulation mode first
  4. Educate users: Provide training on label selection
  5. Monitor false positives: Tune policies based on user feedback
  6. Layer protections: Combine labels, DLP, and encryption
  7. Regular reviews: Update classifications as data evolves
  8. Document exceptions: Track justified overrides and business reasons

Troubleshooting

Labels not appearing for users:

  • Verify label policy is published to user's location
  • Check user has required license (E3/E5)
  • Allow 24 hours for policy propagation
  • Verify Office apps are updated

Scans not discovering expected data:

  • Check scan rule set includes required classifiers
  • Verify credentials have read access to data
  • Review scan logs for errors
  • Ensure file types are supported

DLP policies not blocking:

  • Confirm policy is in Enable mode (not test)
  • Check location scope includes the data source
  • Verify rule conditions match the content
  • Review rule priority (lower number = higher priority)

Next Steps

After implementing Purview data protection, enhance your program:

  • Implement Microsoft Defender for Cloud Apps for SaaS protection
  • Configure Insider Risk Management for behavior analytics
  • Enable Records Management for retention and disposal
  • Review Cloud Security Tips for 2026 for comprehensive cloud security guidance

Frequently Asked Questions

Find answers to common questions

Microsoft Purview Data Map (formerly Azure Purview) focuses on data governance, cataloging data assets across Azure, multi-cloud, and on-premises sources. Purview Information Protection (formerly Microsoft Information Protection/MIP) focuses on labeling and protecting sensitive data in Microsoft 365 and Azure. Both are part of the Microsoft Purview family but serve different purposes. Use Data Map for discovery and cataloging, Information Protection for classification and DLP.

Azure Infrastructure Experts

Comprehensive Azure management including architecture, migration, security, and 24/7 operations.