Understanding Untrusted JSON Risks
Parsing JSON from untrusted sources (user input, external APIs, third-party data) presents security risks. Malicious JSON can exploit parser vulnerabilities, cause denial of service through resource exhaustion, contain malicious scripts, or corrupt application state. Secure parsing requires defensive techniques protecting against these risks.
Untrusted sources include any data not produced by your own code. User input through forms, API responses from external services, uploaded files, data from databases modified by untrusted actors, and any other external data are untrusted. Even data from your own database might be untrusted if modified by attackers after compromise.
Schema Validation
Validating JSON structure prevents malformed data.
JSON Schema Definition: Define expected JSON structure using JSON Schema. Schemas specify property types, required fields, value ranges, and constraints.
Schema Validation Libraries: Libraries in every language validate JSON against schemas. Validation before processing ensures data conformance.
Type Checking: Validation verifies field types match expectations. Type mismatches are caught before processing.
Property Whitelisting: Only accepting expected properties prevents injection of unexpected data. Whitelisting is more secure than blacklisting.
Range Validation: Validating numeric values are within acceptable ranges prevents resource exhaustion. Range validation prevents attack via extreme values.
Example Schema:
{
"type": "object",
"properties": {
"name": {"type": "string", "maxLength": 100},
"age": {"type": "integer", "minimum": 0, "maximum": 150},
"email": {"type": "string", "format": "email"}
},
"required": ["name", "email"]
}
Input Sanitization
Cleaning input prevents injection attacks.
Remove Unexpected Characters: Removing characters outside expected character sets prevents injection. Sanitization of user-controlled strings is essential.
HTML Entity Encoding: If JSON contains HTML that will be rendered, HTML entity encode output. Encoding prevents XSS attacks.
Script Removal: Removing scripts and event handlers prevents malicious code execution. Defensive script removal is important for user-provided content.
Length Limits: Enforcing maximum lengths prevents extremely large inputs. Length limits prevent denial of service.
Whitelist Characters: Accepting only whitelisted characters is more secure than blacklisting. Character whitelisting prevents creative injection attempts.
Preventing Injection Attacks
Protecting against injection vulnerabilities.
Parameterized Queries: When using JSON data in SQL, use parameterized queries. Parameters separate SQL from data.
Template Escaping: When rendering JSON data in templates, escape HTML. Escaping prevents script injection.
No Dynamic Eval: Never use eval() or similar on JSON data. Dynamic code execution is extremely dangerous.
Structural Validation: Validating JSON structure prevents unexpected nested structures. Structure validation prevents tree traversal attacks.
Context-Aware Output Encoding: Encode output appropriate to context (HTML, URL, CSS, JavaScript). Context-specific encoding prevents context-specific attacks.
Resource Limitation
Preventing denial of service through resource exhaustion.
Depth Limits: Rejecting deeply nested JSON prevents stack overflow attacks. Limiting nesting depth protects against algorithmic attacks.
Size Limits: Rejecting excessively large JSON files prevents memory exhaustion. File size limits prevent out-of-memory conditions.
String Length Limits: Limiting maximum string field length prevents extremely large strings. String limits prevent memory exhaustion.
Array Length Limits: Limiting array sizes prevents huge arrays. Array limits prevent memory issues.
Timeout Enforcement: Setting parsing timeouts prevents hanging on pathological JSON. Timeouts prevent resource exhaustion.
Example Depth Limit:
import json
MAX_DEPTH = 10
def parse_with_depth_limit(json_str):
parser = json.JSONDecoder()
def depth_check(obj):
if depth_check.current_depth > MAX_DEPTH:
raise ValueError("JSON nesting too deep")
if isinstance(obj, dict):
depth_check.current_depth += 1
for v in obj.values():
if isinstance(v, (dict, list)):
depth_check(v)
depth_check.current_depth -= 1
elif isinstance(obj, list):
depth_check.current_depth += 1
for item in obj:
if isinstance(item, (dict, list)):
depth_check(item)
depth_check.current_depth -= 1
depth_check.current_depth = 0
obj = parser.decode(json_str)
depth_check(obj)
return obj
Safe Parsing Practices
Best practices for secure JSON parsing.
Try-Catch Error Handling: Always use try-catch around JSON parsing. Error handling prevents crashes from malformed JSON.
Explicit Type Casting: Never trust types from parsed JSON. Explicitly cast to expected types after validation.
Reject Unknown Fields: Ignoring unknown fields prevents injection of unexpected data. Unknown field handling should be strict.
Logging and Monitoring: Log all JSON parsing errors for security monitoring. Logging enables detection of attack attempts.
Rate Limiting: Implement rate limiting on JSON-accepting endpoints. Rate limiting prevents brute force attacks.
Language-Specific Recommendations
Different languages have specific best practices.
JavaScript/Node.js:
- Use
JSON.parse()with try-catch - Validate with JSON Schema libraries
- Never use
eval()or Function constructor - Use libraries like Joi or Yup for validation
Python:
- Use json.loads() with error handling
- Use jsonschema for validation
- Avoid pickle for untrusted data
- Use ast.literal_eval for safe literal evaluation
Java:
- Use reputable JSON libraries (Jackson, Gson)
- Configure parsers for security
- Use annotations for validation
- Implement custom deserialization safely
C#/.NET:
- Use JsonConvert.DeserializeObject with settings
- Configure serializer settings for security
- Use data annotations for validation
- Avoid BinaryFormatter for untrusted data
Library Selection
Choosing secure JSON libraries.
Well-Maintained Libraries: Use popular, actively maintained libraries. Maintenance indicates security updates.
Security Track Record: Research library security history. Libraries with known vulnerabilities should be avoided.
Configurable Security: Choosing libraries enabling security configuration. Libraries with options for depth limits and size limits are better.
Input Validation Integration: Libraries integrating validation are more convenient. Integrated validation reduces separate validation needs.
Example Safe Parsing in Node.js:
const Joi = require('joi');
const schema = Joi.object({
username: Joi.string().alphanum().max(30).required(),
email: Joi.string().email().required(),
age: Joi.number().integer().min(0).max(150),
});
function parseJSON(jsonStr) {
try {
const obj = JSON.parse(jsonStr);
const { value, error } = schema.validate(obj);
if (error) throw new Error(`Validation failed: ${error.message}`);
return value;
} catch (error) {
throw new Error(`Failed to parse JSON: ${error.message}`);
}
}
Third-Party API Validation
Validating data from external APIs.
API Response Validation: Validate all responses from external APIs. External data is untrusted.
Schema Enforcement: Enforce strict schemas for API responses. Strict validation prevents surprises.
TLS Verification: Always verify TLS certificates for HTTPS. Certificate verification prevents man-in-the-middle attacks.
Rate Limiting: Implement rate limiting on API consumption. Rate limiting prevents abuse.
Timeout Enforcement: Setting timeouts prevents hanging on slow APIs. Timeouts protect against denial of service.
File Upload Validation
Validating JSON files from user uploads.
File Type Validation: Verify files are actually JSON by content, not just extension. Content validation prevents spoofing.
File Size Limits: Reject oversized files preventing resource exhaustion. Size limits prevent denial of service.
Malware Scanning: Scanning uploaded files for malware before processing. Scanning prevents malware distribution.
Sandboxed Processing: Processing uploads in isolated environments. Sandboxing limits impact if processing fails.
Quarantine Before Processing: Quarantining uploads before processing enables inspection. Quarantine provides additional safety.
Detecting Malicious JSON
Techniques for identifying suspicious JSON.
Anomaly Detection: Detecting JSON with unusual structure or values. Anomalies might indicate attacks.
Pattern Matching: Detecting known malicious patterns. Pattern matching catches known attacks.
Behavioral Analysis: Analyzing how JSON is used to detect misuse. Behavioral analysis detects novel attacks.
Machine Learning: Training ML models on normal JSON to detect anomalies. ML improves detection over time.
Error Messages and Logging
Secure error handling and logging.
Generic Error Messages: Providing vague error messages to users prevents information leakage. Detailed errors should only go to logs.
Detailed Logging: Logging detailed error information for developers and security monitoring. Logging enables issue investigation.
No Password/Secret Exposure: Never log sensitive data like passwords. Careful logging prevents secret exposure.
Structured Logging: Using structured logging enables analysis and monitoring. Structured logs are easier to analyze.
Log Monitoring: Monitoring logs for attack patterns. Monitoring enables rapid response to attacks.
Regular Security Updates
Maintaining security over time.
Dependency Updates: Keeping JSON libraries updated. Updates include security patches.
Security Advisories: Monitoring security advisories for vulnerabilities. Advisories provide early warning.
Vulnerability Scanning: Scanning code for known vulnerabilities. Scanning tools identify issues.
Security Testing: Regularly security testing JSON parsing. Testing reveals vulnerabilities.
Conclusion
Safely parsing untrusted JSON requires multi-layered defenses. Schema validation ensures JSON matches expected structure. Input sanitization prevents injection attacks. Resource limitation prevents denial of service. Secure error handling and logging support ongoing security. Using reputable, well-maintained libraries and keeping them updated provides foundational security. By combining these practices, applications safely handle JSON from untrusted sources while maintaining security. No single defense is sufficient; layered approaches provide robust protection against diverse attacks. Understanding threats and implementing appropriate defenses enables safe JSON processing even when data sources are untrusted.


