Home/Blog/Can I compare JSON, XML, or structured data?
Development

Can I compare JSON, XML, or structured data?

Comparing structured data like JSON and XML requires specialized approaches. Learn how to effectively compare and understand differences in structured data formats.

By Inventive HQ Team
Can I compare JSON, XML, or structured data?

Comparing Structured Data: Beyond Line-by-Line Diffs

Traditional diff tools work well for comparing text and source code, but structured data formats like JSON and XML present unique challenges. A single property change in JSON might reformat across multiple lines, making line-based diffs confusing and hard to understand. Modern diff tools have evolved to handle these formats intelligently, comparing semantic structure rather than line-by-line text.

Understanding how to effectively compare structured data is essential for developers, DevOps engineers, and data analysts working with APIs, configuration files, and data interchange formats.

The Challenge with Structured Data

Why Line-Based Diffs Are Problematic for JSON

Consider this simple JSON change:

Original:

{
  "user": {
    "name": "John",
    "email": "[email protected]"
  }
}

Modified:

{
  "user": {
    "name": "John",
    "email": "[email protected]",
    "age": 30
  }
}

A line-based diff shows:

   "email": "[email protected]"
- }
+ "age": 30
+ }

This looks confusing because the entire structure appears changed, when really only one property was added.

Why Line-Based Diffs Are Problematic for XML

XML has similar issues:

-  <user>
+  <user id="123">

This shows the tag as removed and re-added, when semantically only an attribute was added.

Semantic Diffing for JSON

Understanding JSON-Aware Diffs

JSON-aware diff tools parse the JSON structure and compare the actual data structure, not the text representation.

Smart JSON diff shows:

user.age:
  - (undefined)
  + 30

Much clearer! This shows exactly what property changed and what the new value is.

Tools for JSON Comparison

Online JSON Diff Tools

JSON diff websites:

  • jsondiff.com
  • jsoncrack.com
  • diffchecker.com (supports JSON)
  • jscompare.com

Advantages:

  • No installation needed
  • Works in browser
  • Often have visualization options
  • Quick for one-off comparisons

Limitations:

  • Privacy concerns (data sent to server)
  • No integration with workflows
  • Limited formatting options

Command-Line JSON Diff Tools

jq - JSON Query Processor:

# Compare two JSON files
diff <(jq -S . file1.json) <(jq -S . file2.json)

# Pretty-print both, then diff
jq -S . file1.json > sorted1.json
jq -S . file2.json > sorted2.json
diff sorted1.json sorted2.json

jd - JSON Diff:

# Install
npm install -g jd

# Compare JSON files
jd file1.json file2.json

# Pretty output
jd -c file1.json file2.json

python-deepdiff:

# Install
pip install deepdiff

# Python script
from deepdiff import DeepDiff
result = DeepDiff(dict1, dict2)
print(result)

IDE and Editor Support

Modern editors provide JSON comparison:

VS Code:

  • Built-in JSON formatter (Shift+Alt+F)
  • Extensions: JSON Diff by Louis Cheung
  • Select two JSON files and compare

IntelliJ IDEA:

  • Right-click file → "Compare With"
  • Shows JSON diff with navigation
  • IDE understands JSON structure

Vim/Neovim:

" Compare two JSON files
:Gvdiffsplit path/to/other.json

JSON Diff Strategies

Strategy 1: Sort Keys for Consistent Comparison

JSON objects have unordered properties. Comparing unsorted JSON can show false differences:

# Sort both files for consistent diff
jq -S . original.json > original-sorted.json
jq -S . modified.json > modified-sorted.json
diff original-sorted.json modified-sorted.json

Strategy 2: Normalize Formatting

Different formatting can create false differences:

# Normalize formatting before comparing
jq '.' original.json > original-normalized.json
jq '.' modified.json > modified-normalized.json
diff original-normalized.json modified-normalized.json

Strategy 3: Semantic Comparison

Compare actual values, not formatting:

import json

with open('original.json') as f:
    original = json.load(f)

with open('modified.json') as f:
    modified = json.load(f)

from deepdiff import DeepDiff
diff = DeepDiff(original, modified)
print(diff)

Semantic Diffing for XML

Understanding XML Diff Challenges

XML differences are complicated by:

  • Attribute vs. element representation differences
  • Whitespace handling (spaces, newlines)
  • Element order (in some schemas, order matters)
  • Namespace declarations
  • Comments and processing instructions

Tools for XML Comparison

Online XML Tools

  • xmldiff.com
  • diffchecker.com (supports XML)
  • Online XML/SOAP clients often include diff features

Command-Line XML Tools

xmllint with diff:

# Format both files and diff
xmllint --format original.xml > original-formatted.xml
xmllint --format modified.xml > modified-formatted.xml
diff original-formatted.xml modified-formatted.xml

xml-patch/xmldiff libraries:

# Python XML diff
pip install xmldiff

# Command line
xmldiff original.xml modified.xml

xdiff (specialized XML diff):

# Linux package
apt-get install xdiff

# Compare XML files
xdiff original.xml modified.xml

IDE Support for XML

VS Code:

  • XML by Red Hat extension
  • Format On Save feature
  • Compare XML files side-by-side

IntelliJ IDEA:

  • Built-in XML diff
  • Understands XML structure
  • Compares by element, not by line

XML Diff Strategies

Normalize Formatting

# Pretty-print both files with consistent formatting
xmllint --format original.xml > original-pretty.xml
xmllint --format modified.xml > modified-pretty.xml

# Then compare
diff original-pretty.xml modified-pretty.xml

Ignore Whitespace

# Diff with whitespace ignored
diff -w original-pretty.xml modified-pretty.xml

# Or with xmllint
xmllint --format --c14n original.xml > original-c14n.xml
xmllint --format --c14n modified.xml > modified-c14n.xml
diff original-c14n.xml modified-c14n.xml

Compare Semantically

from xml.etree import ElementTree as ET

def xml_equal(file1, file2):
    tree1 = ET.parse(file1)
    tree2 = ET.parse(file2)

    def elements_equal(e1, e2):
        # Compare tag, text, attributes
        if e1.tag != e2.tag: return False
        if (e1.text or '').strip() != (e2.text or '').strip(): return False
        if e1.attrib != e2.attrib: return False
        if len(e1) != len(e2): return False
        return all(elements_equal(c1, c2) for c1, c2 in zip(e1, e2))

    return elements_equal(tree1.getroot(), tree2.getroot())

Comparing Other Structured Formats

YAML Comparison

# YAML requires formatting normalization
# Using yamllint
pip install yamllint

# Format files consistently
yamllint -d relaxed original.yaml > original-formatted.yaml
yamllint -d relaxed modified.yaml > modified-formatted.yaml

diff original-formatted.yaml modified-formatted.yaml

TOML Comparison

# TOML comparison (less common, but possible)
# Python approach
pip install toml

python3 << 'EOF'
import toml

with open('original.toml') as f:
    original = toml.load(f)

with open('modified.toml') as f:
    modified = toml.load(f)

from deepdiff import DeepDiff
print(DeepDiff(original, modified))
EOF

CSV Comparison

# CSVDiff for comparing CSV files
pip install csvdiff

# Compare CSV files
csvdiff original.csv modified.csv

Best Practices for Structured Data Comparison

1. Format Consistently Before Comparing

Always normalize formatting:

# JSON
jq -S '.' file.json

# XML
xmllint --format file.xml

# YAML
yamllint -d relaxed file.yaml

2. Use Semantic Diff Tools When Available

Native tools that understand structure > generic diff

3. Ignore Irrelevant Whitespace

# Diff ignoring whitespace
diff -w original.json modified.json

4. Consider Order When It Matters

Some data structures treat order as significant:

  • Arrays (order matters)
  • Objects/dicts (order usually doesn't)
  • XML (depends on schema)

5. Compare Semantically for Complex Data

Don't just compare text representation:

# Better: Compare actual data structures
import json
with open('file1.json') as f:
    data1 = json.load(f)
with open('file2.json') as f:
    data2 = json.load(f)

if data1 == data2:
    print("Files are semantically identical")

Real-World Scenarios

API Response Comparison

# Capture API responses
curl https://api.example.com/users/1 > response1.json

# After changes, capture again
curl https://api.example.com/users/1 > response2.json

# Compare with JSON diff
jq -S . response1.json > response1-sorted.json
jq -S . response2.json > response2-sorted.json
diff response1-sorted.json response2-sorted.json

Configuration File Changes

# Before deploying config changes
jq -S . config-staging.json > config-staging-sorted.json
jq -S . config-production.json > config-production-sorted.json

# See what will change
diff config-staging-sorted.json config-production-sorted.json

Data Migration Verification

# Export data before and after migration
./export-data.sh before migration.json
./migrate-data.sh
./export-data.sh after migration.json

# Verify structure is equivalent
python3 -c "
from deepdiff import DeepDiff
import json
with open('before migration.json') as f:
    before = json.load(f)
with open('after migration.json') as f:
    after = json.load(f)
print(DeepDiff(before, after))
"

Conclusion

Comparing structured data requires different approaches than comparing plain text files. By understanding the limitations of line-based diffs for JSON, XML, and other structured formats, you can choose appropriate tools and strategies:

  • Use JSON-aware diff tools for JSON files
  • Use XML-aware diff tools for XML files
  • Always normalize formatting before comparing
  • Consider semantic comparison for complex data structures
  • Understand when order and formatting matter

Whether you're comparing API responses, configuration files, or data exports, the right approach to structured data comparison prevents false positives, reduces confusion, and provides clear insight into what actually changed. Investing in proper structured data comparison practices pays dividends throughout your development and DevOps workflows.

Need Expert IT & Security Guidance?

Our team is ready to help protect and optimize your business technology infrastructure.