How Do I Safely Parse Untrusted JSON?

Understanding Untrusted JSON Risks

Parsing JSON from untrusted sources (user input, external APIs, third-party data) presents security risks. Malicious JSON can exploit parser vulnerabilities, cause denial of service through resource exhaustion, contain malicious scripts, or corrupt application state. Secure parsing requires defensive techniques protecting against these risks.

Untrusted sources include any data not produced by your own code. User input through forms, API responses from external services, uploaded files, data from databases modified by untrusted actors, and any other external data are untrusted. Even data from your own database might be untrusted if modified by attackers after compromise.

Schema Validation

Validating JSON structure prevents malformed data.

JSON Schema Definition: Define expected JSON structure using JSON Schema. Schemas specify property types, required fields, value ranges, and constraints.

Schema Validation Libraries: Libraries in every language validate JSON against schemas. Validation before processing ensures data conformance.

Type Checking: Validation verifies field types match expectations. Type mismatches are caught before processing.

Property Whitelisting: Only accepting expected properties prevents injection of unexpected data. Whitelisting is more secure than blacklisting.

Range Validation: Validating numeric values are within acceptable ranges prevents resource exhaustion. Range validation prevents attack via extreme values.

Example Schema:

{
  "type": "object",
  "properties": {
    "name": {"type": "string", "maxLength": 100},
    "age": {"type": "integer", "minimum": 0, "maximum": 150},
    "email": {"type": "string", "format": "email"}
  },
  "required": ["name", "email"]
}

Input Sanitization

Cleaning input prevents injection attacks.

Remove Unexpected Characters: Removing characters outside expected character sets prevents injection. Sanitization of user-controlled strings is essential.

HTML Entity Encoding: If JSON contains HTML that will be rendered, HTML entity encode output. Encoding prevents XSS attacks.

Script Removal: Removing scripts and event handlers prevents malicious code execution. Defensive script removal is important for user-provided content.

Length Limits: Enforcing maximum lengths prevents extremely large inputs. Length limits prevent denial of service.

Whitelist Characters: Accepting only whitelisted characters is more secure than blacklisting. Character whitelisting prevents creative injection attempts.

Preventing Injection Attacks

Protecting against injection vulnerabilities.

Parameterized Queries: When using JSON data in SQL, use parameterized queries. Parameters separate SQL from data.

Template Escaping: When rendering JSON data in templates, escape HTML. Escaping prevents script injection.

No Dynamic Eval: Never use eval() or similar on JSON data. Dynamic code execution is extremely dangerous.

Structural Validation: Validating JSON structure prevents unexpected nested structures. Structure validation prevents tree traversal attacks.

Context-Aware Output Encoding: Encode output appropriate to context (HTML, URL, CSS, JavaScript). Context-specific encoding prevents context-specific attacks.

Resource Limitation

Preventing denial of service through resource exhaustion.

Depth Limits: Rejecting deeply nested JSON prevents stack overflow attacks. Limiting nesting depth protects against algorithmic attacks.

Size Limits: Rejecting excessively large JSON files prevents memory exhaustion. File size limits prevent out-of-memory conditions.

String Length Limits: Limiting maximum string field length prevents extremely large strings. String limits prevent memory exhaustion.

Array Length Limits: Limiting array sizes prevents huge arrays. Array limits prevent memory issues.

Timeout Enforcement: Setting parsing timeouts prevents hanging on pathological JSON. Timeouts prevent resource exhaustion.

Example Depth Limit:

import json
MAX_DEPTH = 10

def parse_with_depth_limit(json_str):
    parser = json.JSONDecoder()
    def depth_check(obj):
        if depth_check.current_depth > MAX_DEPTH:
            raise ValueError("JSON nesting too deep")
        if isinstance(obj, dict):
            depth_check.current_depth += 1
            for v in obj.values():
                if isinstance(v, (dict, list)):
                    depth_check(v)
            depth_check.current_depth -= 1
        elif isinstance(obj, list):
            depth_check.current_depth += 1
            for item in obj:
                if isinstance(item, (dict, list)):
                    depth_check(item)
            depth_check.current_depth -= 1

    depth_check.current_depth = 0
    obj = parser.decode(json_str)
    depth_check(obj)
    return obj

Safe Parsing Practices

Best practices for secure JSON parsing.

Try-Catch Error Handling: Always use try-catch around JSON parsing. Error handling prevents crashes from malformed JSON.

Explicit Type Casting: Never trust types from parsed JSON. Explicitly cast to expected types after validation.

Reject Unknown Fields: Ignoring unknown fields prevents injection of unexpected data. Unknown field handling should be strict.

Logging and Monitoring: Log all JSON parsing errors for security monitoring. Logging enables detection of attack attempts.

Rate Limiting: Implement rate limiting on JSON-accepting endpoints. Rate limiting prevents brute force attacks.

Language-Specific Recommendations

Different languages have specific best practices.

JavaScript/Node.js:

Use JSON.parse() with try-catch
Validate with JSON Schema libraries
Never use eval() or Function constructor
Use libraries like Joi or Yup for validation

Python:

Use json.loads() with error handling
Use jsonschema for validation
Avoid pickle for untrusted data
Use ast.literal_eval for safe literal evaluation

Java:

Use reputable JSON libraries (Jackson, Gson)
Configure parsers for security
Use annotations for validation
Implement custom deserialization safely

C#/.NET:

Use JsonConvert.DeserializeObject with settings
Configure serializer settings for security
Use data annotations for validation
Avoid BinaryFormatter for untrusted data

Library Selection

Choosing secure JSON libraries.

Well-Maintained Libraries: Use popular, actively maintained libraries. Maintenance indicates security updates.

Security Track Record: Research library security history. Libraries with known vulnerabilities should be avoided.

Configurable Security: Choosing libraries enabling security configuration. Libraries with options for depth limits and size limits are better.

Input Validation Integration: Libraries integrating validation are more convenient. Integrated validation reduces separate validation needs.

Example Safe Parsing in Node.js:

const Joi = require('joi');

const schema = Joi.object({
  username: Joi.string().alphanum().max(30).required(),
  email: Joi.string().email().required(),
  age: Joi.number().integer().min(0).max(150),
});

function parseJSON(jsonStr) {
  try {
    const obj = JSON.parse(jsonStr);
    const { value, error } = schema.validate(obj);
    if (error) throw new Error(`Validation failed: ${error.message}`);
    return value;
  } catch (error) {
    throw new Error(`Failed to parse JSON: ${error.message}`);
  }
}

Third-Party API Validation

Validating data from external APIs.

API Response Validation: Validate all responses from external APIs. External data is untrusted.

Schema Enforcement: Enforce strict schemas for API responses. Strict validation prevents surprises.

TLS Verification: Always verify TLS certificates for HTTPS. Certificate verification prevents man-in-the-middle attacks.

Rate Limiting: Implement rate limiting on API consumption. Rate limiting prevents abuse.

Timeout Enforcement: Setting timeouts prevents hanging on slow APIs. Timeouts protect against denial of service.

File Upload Validation

Validating JSON files from user uploads.

File Type Validation: Verify files are actually JSON by content, not just extension. Content validation prevents spoofing.

File Size Limits: Reject oversized files preventing resource exhaustion. Size limits prevent denial of service.

Malware Scanning: Scanning uploaded files for malware before processing. Scanning prevents malware distribution.

Sandboxed Processing: Processing uploads in isolated environments. Sandboxing limits impact if processing fails.

Quarantine Before Processing: Quarantining uploads before processing enables inspection. Quarantine provides additional safety.

Detecting Malicious JSON

Techniques for identifying suspicious JSON.

Anomaly Detection: Detecting JSON with unusual structure or values. Anomalies might indicate attacks.

Pattern Matching: Detecting known malicious patterns. Pattern matching catches known attacks.

Behavioral Analysis: Analyzing how JSON is used to detect misuse. Behavioral analysis detects novel attacks.

Machine Learning: Training ML models on normal JSON to detect anomalies. ML improves detection over time.

Error Messages and Logging

Secure error handling and logging.

Generic Error Messages: Providing vague error messages to users prevents information leakage. Detailed errors should only go to logs.

Detailed Logging: Logging detailed error information for developers and security monitoring. Logging enables issue investigation.

No Password/Secret Exposure: Never log sensitive data like passwords. Careful logging prevents secret exposure.

Structured Logging: Using structured logging enables analysis and monitoring. Structured logs are easier to analyze.

Log Monitoring: Monitoring logs for attack patterns. Monitoring enables rapid response to attacks.

Regular Security Updates

Maintaining security over time.

Dependency Updates: Keeping JSON libraries updated. Updates include security patches.

Security Advisories: Monitoring security advisories for vulnerabilities. Advisories provide early warning.

Vulnerability Scanning: Scanning code for known vulnerabilities. Scanning tools identify issues.

Security Testing: Regularly security testing JSON parsing. Testing reveals vulnerabilities.

Conclusion

Safely parsing untrusted JSON requires multi-layered defenses. Schema validation ensures JSON matches expected structure. Input sanitization prevents injection attacks. Resource limitation prevents denial of service. Secure error handling and logging support ongoing security. Using reputable, well-maintained libraries and keeping them updated provides foundational security. By combining these practices, applications safely handle JSON from untrusted sources while maintaining security. No single defense is sufficient; layered approaches provide robust protection against diverse attacks. Understanding threats and implementing appropriate defenses enables safe JSON processing even when data sources are untrusted.

How Do I Safely Parse Untrusted JSON?

Understanding Untrusted JSON Risks

Schema Validation

Input Sanitization

Preventing Injection Attacks

Resource Limitation

Safe Parsing Practices

Language-Specific Recommendations

Library Selection

Third-Party API Validation

File Upload Validation

Detecting Malicious JSON

Error Messages and Logging

Regular Security Updates

Conclusion

Need Expert Cybersecurity Guidance?

Data breach trends 2023-2025: What organizations and consumers need to know

Common employee cybersecurity mistakes and how to prevent them

CrowdStrike Outage Analysis: What Happened & What's Next

How Do I Safely Parse Untrusted JSON?

Understanding Untrusted JSON Risks

Schema Validation

Input Sanitization

Preventing Injection Attacks

Resource Limitation

Safe Parsing Practices

Language-Specific Recommendations

Library Selection

Third-Party API Validation

File Upload Validation

Detecting Malicious JSON

Error Messages and Logging

Regular Security Updates

Conclusion

Need Expert Cybersecurity Guidance?

Related Articles

Data breach trends 2023-2025: What organizations and consumers need to know

Common employee cybersecurity mistakes and how to prevent them

CrowdStrike Outage Analysis: What Happened & What's Next