Analyze Shannon entropy to detect packed, encrypted, or obfuscated files
Need Professional IT Services?
Our IT professionals can help optimize your infrastructure and improve your operations.
The Mathematics of Shannon Entropy
Shannon entropy, developed by Claude Shannon in 1948, measures information density or randomness within a dataset. For binary files, the formula calculates entropy on a scale of 0 to 8 bits per byte:
H(X) = -Σ p(xᵢ) × log₂(p(xᵢ))
Where H(X) is the entropy in bits per byte, p(xᵢ) is the probability of byte value i occurring, and the sum runs across all 256 possible byte values (0x00 to 0xFF).
Entropy Across Different Content Types
- Text Files (3.0 - 5.0): Only ~95 printable characters used, with non-uniform letter frequencies
- Executable Code (4.5 - 6.5): Machine instructions follow patterns; strings and padding lower average entropy
- Compressed/Encrypted Data (7.0 - 8.0): Compression removes redundancy; encryption produces output indistinguishable from random data
- Random Data (7.9 - 8.0): Cryptographic random number generators approach the theoretical maximum
Section-by-Section Analysis
Binary files contain sections with distinct purposes. Analyzing entropy per section reveals anomalies that whole-file entropy might miss.
PE File Entropy Expectations
- .text (code section): Expected 5.5 - 6.8. Entropy > 7.0 suggests packed code or encrypted shellcode
- .data (initialized data): Expected 3.0 - 6.0. Entropy > 7.5 suggests encrypted configuration
- .rsrc (resources): Variable (4.0 - 7.5). High entropy normal for compressed images
- .reloc (relocations): Expected 4.0 - 5.5. Unusually large with high entropy is suspicious
Detecting Embedded Payloads
Look for sudden entropy spikes—jumps of > 2.0 indicate transitions from normal code to encrypted data. A typical dropper pattern shows: normal headers (5.2-5.8), encrypted payload (7.8-8.0), then decryption stub (5.5-6.0).
Interpreting Byte Distribution
The byte distribution histogram shows how uniformly byte values are distributed:
- Flat distribution (equal bar heights): Indicates encryption or strong compression—all byte values appear with similar frequency (~1/256)
- Peaked distribution (few dominant bytes): Common in text, padding, or structured data
- Bimodal distribution (two distinct peaks): May indicate mixed content requiring separate section analysis
Practical Analysis Workflow
- Initial Scan: Flag files with overall entropy > 6.8 for deeper investigation
- Correlate Metadata: Check for packer signatures (UPX0, .aspack, .themida)
- String Analysis: Low string count + high entropy = likely packed
- Identify Stubs: Find low-entropy regions adjacent to high-entropy regions
- Dynamic Analysis: Execute in sandbox if entropy indicates packing
Understanding False Positives
High entropy doesn't automatically indicate malware. Legitimate high-entropy files include:
- Archives: ZIP/RAR files naturally have entropy 7.5-8.0
- Multimedia: JPEG images (7.2-7.6), MP3 audio, H.264 video
- Cryptographic material: SSL certificates, private keys
- DRM-protected software: Encrypted game assets and license-protected applications
Differentiation clues: File extension matches content, valid digital signatures, appropriate section names, and presence of readable metadata all suggest benign high entropy.
Attacker Countermeasures
Sophisticated attackers employ entropy-lowering techniques:
- Padding: Appending zeros reduces average entropy below detection thresholds
- Partial encryption: Encrypting only critical functions produces moderate overall entropy
- Frequency sculpting: Adjusting byte distribution to mimic natural language
Entropy analysis should always be combined with static analysis, behavioral monitoring, and signature matching for comprehensive detection.
Frequently Asked Questions
Common questions about the Entropy Analyzer
Shannon entropy measures the randomness or unpredictability of data on a scale from 0 (completely predictable) to 8 (maximum randomness). It's crucial for malware analysis because malicious software often uses packers, encryptors, or obfuscators that produce high-entropy output to evade antivirus detection, making entropy a quick first indicator of potential threats.
ℹ️ Disclaimer
This tool is provided for informational and educational purposes only. All processing happens entirely in your browser - no data is sent to or stored on our servers. While we strive for accuracy, we make no warranties about the completeness or reliability of results. Use at your own discretion.