How to Merge Historical and New Cost Export Data in Azure
Azure Cost Management exports generate daily or monthly billing files, but historical data from before you set up exports isn't automatically included. This creates a gap when you need to perform year-over-year comparisons, trend analysis, or long-term forecasting. This guide shows you how to combine historical billing data with your ongoing cost exports to create a unified, continuous dataset for comprehensive financial analysis.
Overview
Merging historical and new cost data involves several challenges:
- Different data schemas: Historical exports may use older formats than current exports
- File location differences: Old data might be in different storage accounts or containers
- Duplicate prevention: Overlapping date ranges need deduplication
- Performance at scale: Processing years of billing data requires efficient queries
- Ongoing synchronization: New exports must continuously merge with historical baseline
This guide covers four approaches:
- Azure Data Factory: Automated pipeline-based merging (recommended for production)
- Azure Synapse Analytics: SQL-based merging with serverless or dedicated pools
- PowerShell scripting: Manual or scheduled merging for smaller datasets
- Azure Databricks: Advanced transformations and large-scale processing
Prerequisites
Before you begin, ensure you have:
- Azure subscription with existing cost exports configured
- Historical billing data exported and stored in Azure Storage (CSV or Parquet format)
- Azure PowerShell module (7.0.0+) OR Azure CLI (2.40.0+)
- Storage account access with Storage Blob Data Contributor role
- Azure Data Factory OR Azure Synapse Analytics workspace (depending on chosen method)
- Power BI or analytics tool for validation (optional)
- Understanding of Azure billing data schema (UsageDate, Cost, ResourceId, etc.)
Understanding Cost Data Schema
Common Fields Across Versions
Both historical and current exports typically include:
UsageDateTime / Date - When the resource was consumed
Cost / PreTaxCost - Actual cost in billing currency
MeterCategory - Service category (e.g., Virtual Machines, Storage)
ResourceId - Full Azure Resource Manager ID
ResourceGroup - Resource group name
SubscriptionId - Subscription GUID
Tags - Resource tags as JSON
Schema Evolution Considerations
Schema changes between versions:
- Field name changes:
UsageDateTime→Date,PreTaxCost→Cost - New fields added:
PricingModel,CostInUsd,ExchangeRate - Deprecated fields: Some older fields may no longer exist
Solution: Use schema mapping to normalize field names before merging.
Method 1: Azure Data Factory Pipeline (Recommended)
Azure Data Factory provides automated, scalable pipelines for merging historical and incremental data.
Step 1: Create Data Factory Workspace
# Create resource group
az group create \
--name rg-cost-analytics \
--location eastus
# Create Data Factory
az datafactory create \
--resource-group rg-cost-analytics \
--name adf-cost-merge \
--location eastus
Step 2: Create Linked Services
# Create linked service for storage account (historical data)
az datafactory linked-service create \
--resource-group rg-cost-analytics \
--factory-name adf-cost-merge \
--linked-service-name LS_HistoricalStorage \
--properties '{
"type": "AzureBlobStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=historicalcosts;AccountKey=..."
}
}'
# Create linked service for current exports storage
az datafactory linked-service create \
--resource-group rg-cost-analytics \
--factory-name adf-cost-merge \
--linked-service-name LS_CurrentExports \
--properties '{
"type": "AzureBlobStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=currentcosts;AccountKey=..."
}
}'
Step 3: Create Datasets
Historical Dataset:
{
"name": "DS_HistoricalCosts",
"properties": {
"linkedServiceName": {
"referenceName": "LS_HistoricalStorage",
"type": "LinkedServiceReference"
},
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"container": "historical-exports",
"folderPath": "costs"
},
"columnDelimiter": ",",
"rowDelimiter": "\n",
"firstRowAsHeader": true
},
"schema": []
}
}
Current Exports Dataset:
{
"name": "DS_CurrentExports",
"properties": {
"linkedServiceName": {
"referenceName": "LS_CurrentExports",
"type": "LinkedServiceReference"
},
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"container": "cost-exports",
"folderPath": "daily"
},
"columnDelimiter": ",",
"rowDelimiter": "\n",
"firstRowAsHeader": true
},
"schema": []
}
}
Step 4: Create Merge Pipeline
{
"name": "PL_MergeCostData",
"properties": {
"activities": [
{
"name": "CopyHistoricalData",
"type": "Copy",
"inputs": [
{
"referenceName": "DS_HistoricalCosts",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "DS_MergedOutput",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "DelimitedTextSource"
},
"sink": {
"type": "DelimitedTextSink",
"writeBehavior": "append"
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": { "name": "UsageDateTime" },
"sink": { "name": "Date" }
},
{
"source": { "name": "PreTaxCost" },
"sink": { "name": "Cost" }
}
]
}
}
},
{
"name": "AppendCurrentData",
"type": "Copy",
"dependsOn": [
{
"activity": "CopyHistoricalData",
"dependencyConditions": ["Succeeded"]
}
],
"inputs": [
{
"referenceName": "DS_CurrentExports",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "DS_MergedOutput",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "DelimitedTextSource"
},
"sink": {
"type": "DelimitedTextSink",
"writeBehavior": "append"
}
}
}
]
}
}
Step 5: Run and Schedule Pipeline
# Trigger pipeline manually
az datafactory pipeline create-run \
--resource-group rg-cost-analytics \
--factory-name adf-cost-merge \
--name PL_MergeCostData
# Create daily schedule trigger
az datafactory trigger create \
--resource-group rg-cost-analytics \
--factory-name adf-cost-merge \
--name TR_DailyMerge \
--properties '{
"type": "ScheduleTrigger",
"typeProperties": {
"recurrence": {
"frequency": "Day",
"interval": 1,
"startTime": "2025-01-01T02:00:00Z"
}
},
"pipelines": [
{
"pipelineReference": {
"referenceName": "PL_MergeCostData",
"type": "PipelineReference"
}
}
]
}'
# Start the trigger
az datafactory trigger start \
--resource-group rg-cost-analytics \
--factory-name adf-cost-merge \
--name TR_DailyMerge
Method 2: Azure Synapse Analytics
Step 1: Create Synapse Workspace
# Create Synapse workspace
az synapse workspace create \
--name synapse-cost-analytics \
--resource-group rg-cost-analytics \
--storage-account historicalcosts \
--file-system cost-data \
--sql-admin-login-user sqladmin \
--sql-admin-login-password "YourP@ssw0rd!" \
--location eastus
Step 2: Create External Tables for Historical Data
-- Connect to Synapse serverless SQL pool
-- Create database
CREATE DATABASE CostAnalytics;
GO
USE CostAnalytics;
GO
-- Create data source for historical costs
CREATE EXTERNAL DATA SOURCE HistoricalCosts
WITH (
LOCATION = 'https://historicalcosts.blob.core.windows.net/historical-exports',
CREDENTIAL = StorageCredential
);
-- Create file format
CREATE EXTERNAL FILE FORMAT CsvFormat
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '"',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = FALSE
)
);
-- Create external table for historical data
CREATE EXTERNAL TABLE HistoricalCostData
(
UsageDateTime DATE,
SubscriptionId VARCHAR(36),
ResourceGroup VARCHAR(90),
ResourceId VARCHAR(255),
MeterCategory VARCHAR(50),
PreTaxCost DECIMAL(18,4),
Currency VARCHAR(3),
Tags VARCHAR(MAX)
)
WITH (
LOCATION = '/costs/*.csv',
DATA_SOURCE = HistoricalCosts,
FILE_FORMAT = CsvFormat
);
Step 3: Create External Tables for Current Exports
-- Create data source for current exports
CREATE EXTERNAL DATA SOURCE CurrentExports
WITH (
LOCATION = 'https://currentcosts.blob.core.windows.net/cost-exports',
CREDENTIAL = StorageCredential
);
-- Create external table for current exports
CREATE EXTERNAL TABLE CurrentCostData
(
Date DATE,
SubscriptionId VARCHAR(36),
ResourceGroup VARCHAR(90),
ResourceId VARCHAR(255),
MeterCategory VARCHAR(50),
Cost DECIMAL(18,4),
Currency VARCHAR(3),
Tags VARCHAR(MAX)
)
WITH (
LOCATION = '/daily/*.csv',
DATA_SOURCE = CurrentExports,
FILE_FORMAT = CsvFormat
);
Step 4: Create Unified View with Schema Normalization
-- Create view that merges and normalizes both datasets
CREATE VIEW vw_MergedCostData AS
SELECT
UsageDateTime AS Date,
SubscriptionId,
ResourceGroup,
ResourceId,
MeterCategory,
PreTaxCost AS Cost,
Currency,
Tags,
'Historical' AS DataSource
FROM HistoricalCostData
WHERE UsageDateTime < '2025-01-01' -- Cutoff date for historical data
UNION ALL
SELECT
Date,
SubscriptionId,
ResourceGroup,
ResourceId,
MeterCategory,
Cost,
Currency,
Tags,
'Current' AS DataSource
FROM CurrentCostData
WHERE Date >= '2025-01-01'; -- Start date for current exports
GO
Step 5: Materialize Merged Data (Optional)
-- Create materialized table for better query performance
CREATE TABLE MergedCostData
WITH (
DISTRIBUTION = HASH(SubscriptionId),
CLUSTERED COLUMNSTORE INDEX
)
AS
SELECT * FROM vw_MergedCostData;
-- Create statistics for query optimization
CREATE STATISTICS stat_date ON MergedCostData(Date);
CREATE STATISTICS stat_subscription ON MergedCostData(SubscriptionId);
CREATE STATISTICS stat_resourcegroup ON MergedCostData(ResourceGroup);
Step 6: Schedule Refresh with Synapse Pipeline
-- Create stored procedure for incremental refresh
CREATE PROCEDURE sp_RefreshMergedCosts
AS
BEGIN
-- Delete and reload last 3 days to handle late-arriving data
DELETE FROM MergedCostData
WHERE Date >= DATEADD(day, -3, GETDATE());
-- Insert new/updated data
INSERT INTO MergedCostData
SELECT * FROM vw_MergedCostData
WHERE Date >= DATEADD(day, -3, GETDATE());
END;
GO
Create Synapse pipeline to run this stored procedure daily.
Method 3: PowerShell Script
For smaller datasets or manual merging:
# Cost Data Merge Script
param(
[Parameter(Mandatory=$true)]
[string]$HistoricalStorageAccount,
[Parameter(Mandatory=$true)]
[string]$HistoricalContainer,
[Parameter(Mandatory=$true)]
[string]$CurrentStorageAccount,
[Parameter(Mandatory=$true)]
[string]$CurrentContainer,
[Parameter(Mandatory=$true)]
[string]$OutputStorageAccount,
[Parameter(Mandatory=$true)]
[string]$OutputContainer,
[Parameter(Mandatory=$false)]
[datetime]$CutoffDate = (Get-Date "2025-01-01")
)
# Install required modules
Install-Module -Name Az.Storage -Force -AllowClobber
# Connect to Azure
Connect-AzAccount
# Function to normalize schema
function Normalize-CostData {
param($Data, $IsHistorical)
if ($IsHistorical) {
# Map old field names to new schema
$Data | Select-Object `
@{N='Date';E={$_.UsageDateTime}},
@{N='Cost';E={$_.PreTaxCost}},
SubscriptionId,
ResourceGroup,
ResourceId,
MeterCategory,
Currency,
Tags,
@{N='DataSource';E={'Historical'}}
} else {
# Current schema, just add source flag
$Data | Select-Object *,
@{N='DataSource';E={'Current'}}
}
}
# Get storage contexts
$historicalCtx = New-AzStorageContext `
-StorageAccountName $HistoricalStorageAccount `
-UseConnectedAccount
$currentCtx = New-AzStorageContext `
-StorageAccountName $CurrentStorageAccount `
-UseConnectedAccount
$outputCtx = New-AzStorageContext `
-StorageAccountName $OutputStorageAccount `
-UseConnectedAccount
Write-Host "Downloading historical cost data..." -ForegroundColor Cyan
# Download and process historical data
$historicalBlobs = Get-AzStorageBlob `
-Context $historicalCtx `
-Container $HistoricalContainer
$historicalData = @()
foreach ($blob in $historicalBlobs) {
$tempFile = [System.IO.Path]::GetTempFileName()
Get-AzStorageBlobContent `
-Context $historicalCtx `
-Container $HistoricalContainer `
-Blob $blob.Name `
-Destination $tempFile `
-Force | Out-Null
$csvData = Import-Csv -Path $tempFile
$normalizedData = Normalize-CostData -Data $csvData -IsHistorical $true
$historicalData += $normalizedData | Where-Object { $_.Date -lt $CutoffDate }
Remove-Item $tempFile
}
Write-Host "Downloaded $($historicalData.Count) historical records" -ForegroundColor Green
Write-Host "Downloading current cost data..." -ForegroundColor Cyan
# Download and process current data
$currentBlobs = Get-AzStorageBlob `
-Context $currentCtx `
-Container $CurrentContainer
$currentData = @()
foreach ($blob in $currentBlobs) {
$tempFile = [System.IO.Path]::GetTempFileName()
Get-AzStorageBlobContent `
-Context $currentCtx `
-Container $CurrentContainer `
-Blob $blob.Name `
-Destination $tempFile `
-Force | Out-Null
$csvData = Import-Csv -Path $tempFile
$normalizedData = Normalize-CostData -Data $csvData -IsHistorical $false
$currentData += $normalizedData | Where-Object { $_.Date -ge $CutoffDate }
Remove-Item $tempFile
}
Write-Host "Downloaded $($currentData.Count) current records" -ForegroundColor Green
Write-Host "Merging datasets and removing duplicates..." -ForegroundColor Cyan
# Merge and deduplicate
$mergedData = $historicalData + $currentData
$uniqueData = $mergedData | Group-Object -Property Date,SubscriptionId,ResourceId |
ForEach-Object { $_.Group | Select-Object -First 1 }
Write-Host "Merged dataset contains $($uniqueData.Count) unique records" -ForegroundColor Green
# Export merged data
$outputFile = [System.IO.Path]::GetTempFileName()
$uniqueData | Export-Csv -Path $outputFile -NoTypeInformation
# Upload to output storage
$outputBlobName = "merged-costs-$(Get-Date -Format 'yyyyMMdd').csv"
Set-AzStorageBlobContent `
-Context $outputCtx `
-Container $OutputContainer `
-File $outputFile `
-Blob $outputBlobName `
-Force
Write-Host "Merged data uploaded to: $outputBlobName" -ForegroundColor Green
# Cleanup
Remove-Item $outputFile
Run the script:
.\Merge-CostData.ps1 `
-HistoricalStorageAccount "historicalcosts" `
-HistoricalContainer "historical-exports" `
-CurrentStorageAccount "currentcosts" `
-CurrentContainer "cost-exports" `
-OutputStorageAccount "mergedcosts" `
-OutputContainer "merged-data" `
-CutoffDate "2025-01-01"
Method 4: Azure Databricks
For advanced transformations and large-scale processing:
# Install libraries
%pip install azure-storage-blob
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, lit, to_date
from pyspark.sql.types import *
# Initialize Spark session
spark = SparkSession.builder \
.appName("CostDataMerge") \
.config("spark.sql.adaptive.enabled", "true") \
.getOrCreate()
# Configure storage access
storage_account = "historicalcosts"
storage_key = "your-storage-key"
spark.conf.set(
f"fs.azure.account.key.{storage_account}.blob.core.windows.net",
storage_key
)
# Define schema for historical data
historical_schema = StructType([
StructField("UsageDateTime", StringType(), True),
StructField("SubscriptionId", StringType(), True),
StructField("ResourceGroup", StringType(), True),
StructField("ResourceId", StringType(), True),
StructField("MeterCategory", StringType(), True),
StructField("PreTaxCost", DoubleType(), True),
StructField("Currency", StringType(), True),
StructField("Tags", StringType(), True)
])
# Load historical data
historical_df = spark.read \
.option("header", "true") \
.schema(historical_schema) \
.csv("wasbs://[email protected]/costs/*.csv")
# Normalize historical data schema
historical_normalized = historical_df \
.withColumn("Date", to_date(col("UsageDateTime"))) \
.withColumnRenamed("PreTaxCost", "Cost") \
.withColumn("DataSource", lit("Historical")) \
.filter(col("Date") < "2025-01-01") \
.select("Date", "SubscriptionId", "ResourceGroup", "ResourceId",
"MeterCategory", "Cost", "Currency", "Tags", "DataSource")
# Define schema for current data
current_schema = StructType([
StructField("Date", StringType(), True),
StructField("SubscriptionId", StringType(), True),
StructField("ResourceGroup", StringType(), True),
StructField("ResourceId", StringType(), True),
StructField("MeterCategory", StringType(), True),
StructField("Cost", DoubleType(), True),
StructField("Currency", StringType(), True),
StructField("Tags", StringType(), True)
])
# Load current exports
current_df = spark.read \
.option("header", "true") \
.schema(current_schema) \
.csv("wasbs://[email protected]/daily/*.csv")
# Normalize current data
current_normalized = current_df \
.withColumn("Date", to_date(col("Date"))) \
.withColumn("DataSource", lit("Current")) \
.filter(col("Date") >= "2025-01-01") \
.select("Date", "SubscriptionId", "ResourceGroup", "ResourceId",
"MeterCategory", "Cost", "Currency", "Tags", "DataSource")
# Merge datasets
merged_df = historical_normalized.union(current_normalized)
# Remove duplicates (keep most recent)
deduplicated_df = merged_df \
.dropDuplicates(["Date", "SubscriptionId", "ResourceId"])
# Show statistics
print(f"Historical records: {historical_normalized.count()}")
print(f"Current records: {current_normalized.count()}")
print(f"Merged records: {merged_df.count()}")
print(f"After deduplication: {deduplicated_df.count()}")
# Write merged data to Parquet (optimized format)
deduplicated_df.write \
.mode("overwrite") \
.partitionBy("Date") \
.parquet("wasbs://[email protected]/costs/")
# Create Delta table for incremental updates (optional)
deduplicated_df.write \
.format("delta") \
.mode("overwrite") \
.save("wasbs://[email protected]/costs-delta/")
print("Merge complete!")
Best Practices
1. Define Clear Cutoff Dates
-- Use exact timestamp to prevent overlaps
DECLARE @CutoffDate DATETIME = '2025-01-01 00:00:00';
SELECT * FROM HistoricalData WHERE Date < @CutoffDate
UNION ALL
SELECT * FROM CurrentData WHERE Date >= @CutoffDate;
2. Implement Deduplication Logic
# Python/PySpark deduplication example
deduplicated = merged_df.dropDuplicates([
"Date",
"SubscriptionId",
"ResourceId",
"MeterId"
])
3. Validate Merged Data
-- Check for gaps in date range
WITH DateRange AS (
SELECT MIN(Date) AS StartDate, MAX(Date) AS EndDate
FROM MergedCostData
),
AllDates AS (
SELECT DATEADD(day, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1, StartDate) AS Date
FROM DateRange
CROSS JOIN master..spt_values
WHERE type = 'P'
AND DATEADD(day, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1, StartDate) <= EndDate
)
SELECT d.Date AS MissingDate
FROM AllDates d
LEFT JOIN MergedCostData m ON d.Date = m.Date
WHERE m.Date IS NULL
ORDER BY d.Date;
4. Monitor Merge Performance
# Track merge execution time
$startTime = Get-Date
# ... merge operations ...
$endTime = Get-Date
$duration = $endTime - $startTime
Write-Host "Merge completed in $($duration.TotalMinutes) minutes"
5. Use Partitioning for Large Datasets
-- Partition by month for better query performance
CREATE TABLE MergedCostData
WITH (
DISTRIBUTION = HASH(SubscriptionId),
PARTITION (Date RANGE RIGHT FOR VALUES (
'2023-01-01', '2023-02-01', '2023-03-01', ..., '2025-12-01'
))
)
AS SELECT * FROM vw_MergedCostData;
6. Implement Incremental Refresh
# Only process new data since last merge
last_merge_date = get_last_merge_timestamp()
new_data = current_df.filter(col("Date") > last_merge_date)
# Append only new records
Troubleshooting
Issue: Schema mismatch between historical and current data
Symptoms: Union/merge fails with column type errors
Resolution:
-- Explicitly cast columns to matching types
SELECT
CAST(UsageDateTime AS DATE) AS Date,
CAST(PreTaxCost AS DECIMAL(18,4)) AS Cost,
...
FROM HistoricalData
UNION ALL
SELECT
CAST(Date AS DATE) AS Date,
CAST(Cost AS DECIMAL(18,4)) AS Cost,
...
FROM CurrentData;
Issue: Duplicate records in merged dataset
Symptoms: Total cost sums don't match individual sources
Resolution:
-- Use ROW_NUMBER to keep only first occurrence
WITH Deduplicated AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY Date, SubscriptionId, ResourceId
ORDER BY DataSource DESC -- Prefer 'Historical' over 'Current'
) AS rn
FROM MergedData
)
SELECT * FROM Deduplicated WHERE rn = 1;
Issue: Performance degradation with large datasets
Symptoms: Merge queries take hours to complete
Resolution:
# Use Parquet instead of CSV
spark.read.parquet("wasbs://.../costs.parquet")
# Enable adaptive query execution
spark.conf.set("spark.sql.adaptive.enabled", "true")
# Repartition large datasets
df.repartition(100, "Date")
Issue: Missing data in date ranges
Symptoms: Gaps in merged dataset
Resolution:
# Verify source data completeness
$historical = Get-AzStorageBlob -Container "historical" |
Where-Object { $_.Name -match "(\d{4}-\d{2}-\d{2})" } |
ForEach-Object { $Matches[1] } | Sort-Object
$current = Get-AzStorageBlob -Container "current" |
Where-Object { $_.Name -match "(\d{4}-\d{2}-\d{2})" } |
ForEach-Object { $Matches[1] } | Sort-Object
# Find gaps
$allDates = $historical + $current | Sort-Object -Unique
# Check for missing dates in sequence
Issue: Out of memory errors during merge
Symptoms: PowerShell/Python script crashes
Resolution:
# Process in batches
$batchSize = 10000
for ($i = 0; $i -lt $allData.Count; $i += $batchSize) {
$batch = $allData[$i..[Math]::Min($i + $batchSize - 1, $allData.Count - 1)]
# Process batch
$batch | Export-Csv -Path "batch-$i.csv" -Append
}
Validation and Testing
Data Quality Checks
-- Validate merged data
-- 1. Check total cost matches sum of sources
SELECT
'Historical' AS Source,
COUNT(*) AS RecordCount,
SUM(Cost) AS TotalCost
FROM HistoricalData
UNION ALL
SELECT
'Current' AS Source,
COUNT(*) AS RecordCount,
SUM(Cost) AS TotalCost
FROM CurrentData
UNION ALL
SELECT
'Merged' AS Source,
COUNT(*) AS RecordCount,
SUM(Cost) AS TotalCost
FROM MergedData;
-- 2. Check for duplicates
SELECT Date, SubscriptionId, ResourceId, COUNT(*) AS DuplicateCount
FROM MergedData
GROUP BY Date, SubscriptionId, ResourceId
HAVING COUNT(*) > 1;
-- 3. Validate date continuity
WITH DateGaps AS (
SELECT
Date,
LEAD(Date) OVER (ORDER BY Date) AS NextDate,
DATEDIFF(day, Date, LEAD(Date) OVER (ORDER BY Date)) AS DaysBetween
FROM (SELECT DISTINCT Date FROM MergedData) AS Dates
)
SELECT * FROM DateGaps WHERE DaysBetween > 1;
-- 4. Check for null critical fields
SELECT
COUNT(*) AS TotalRecords,
SUM(CASE WHEN Date IS NULL THEN 1 ELSE 0 END) AS NullDates,
SUM(CASE WHEN Cost IS NULL THEN 1 ELSE 0 END) AS NullCosts,
SUM(CASE WHEN SubscriptionId IS NULL THEN 1 ELSE 0 END) AS NullSubscriptions
FROM MergedData;
Next Steps
Once data is merged:
- Create Power BI dashboards: Connect to merged dataset for year-over-year analysis
- Set up alerts: Monitor for anomalies using "How to Set Up Cost Alerts and Budgets in Azure"
- Automate refresh: Schedule daily merges to keep data current
- Optimize storage: Convert to Parquet or Delta format for better performance
- Secure data: Apply encryption and access controls per "How to Secure Cost Management Data in Azure Storage and Synapse"
Related Resources
Frequently Asked Questions
Find answers to common questions
Need Professional Help?
Our team of experts can help you implement and configure these solutions for your organization.