Analyze Shannon entropy to detect packed, encrypted, or obfuscated files
Detecting Packed Malware?
Our analysts use advanced techniques to identify and analyze obfuscated threats.
The Mathematics of Shannon Entropy
Shannon entropy, developed by Claude Shannon in 1948, measures information density or randomness within a dataset. For binary files, the formula calculates entropy on a scale of 0 to 8 bits per byte:
H(X) = -Σ p(xᵢ) × log₂(p(xᵢ))
Where H(X) is the entropy in bits per byte, p(xᵢ) is the probability of byte value i occurring, and the sum runs across all 256 possible byte values (0x00 to 0xFF).
Entropy Across Different Content Types
- Text Files (3.0 - 5.0): Only ~95 printable characters used, with non-uniform letter frequencies
- Executable Code (4.5 - 6.5): Machine instructions follow patterns; strings and padding lower average entropy
- Compressed/Encrypted Data (7.0 - 8.0): Compression removes redundancy; encryption produces output indistinguishable from random data
- Random Data (7.9 - 8.0): Cryptographic random number generators approach the theoretical maximum
Section-by-Section Analysis
Binary files contain sections with distinct purposes. Analyzing entropy per section reveals anomalies that whole-file entropy might miss.
PE File Entropy Expectations
- .text (code section): Expected 5.5 - 6.8. Entropy > 7.0 suggests packed code or encrypted shellcode
- .data (initialized data): Expected 3.0 - 6.0. Entropy > 7.5 suggests encrypted configuration
- .rsrc (resources): Variable (4.0 - 7.5). High entropy normal for compressed images
- .reloc (relocations): Expected 4.0 - 5.5. Unusually large with high entropy is suspicious
Detecting Embedded Payloads
Look for sudden entropy spikes—jumps of > 2.0 indicate transitions from normal code to encrypted data. A typical dropper pattern shows: normal headers (5.2-5.8), encrypted payload (7.8-8.0), then decryption stub (5.5-6.0).
Interpreting Byte Distribution
The byte distribution histogram shows how uniformly byte values are distributed:
- Flat distribution (equal bar heights): Indicates encryption or strong compression—all byte values appear with similar frequency (~1/256)
- Peaked distribution (few dominant bytes): Common in text, padding, or structured data
- Bimodal distribution (two distinct peaks): May indicate mixed content requiring separate section analysis
Practical Analysis Workflow
- Initial Scan: Flag files with overall entropy > 6.8 for deeper investigation
- Correlate Metadata: Check for packer signatures (UPX0, .aspack, .themida)
- String Analysis: Low string count + high entropy = likely packed
- Identify Stubs: Find low-entropy regions adjacent to high-entropy regions
- Dynamic Analysis: Execute in sandbox if entropy indicates packing
Understanding False Positives
High entropy doesn't automatically indicate malware. Legitimate high-entropy files include:
- Archives: ZIP/RAR files naturally have entropy 7.5-8.0
- Multimedia: JPEG images (7.2-7.6), MP3 audio, H.264 video
- Cryptographic material: SSL certificates, private keys
- DRM-protected software: Encrypted game assets and license-protected applications
Differentiation clues: File extension matches content, valid digital signatures, appropriate section names, and presence of readable metadata all suggest benign high entropy.
Attacker Countermeasures
Sophisticated attackers employ entropy-lowering techniques:
- Padding: Appending zeros reduces average entropy below detection thresholds
- Partial encryption: Encrypting only critical functions produces moderate overall entropy
- Frequency sculpting: Adjusting byte distribution to mimic natural language
Entropy analysis should always be combined with static analysis, behavioral monitoring, and signature matching for comprehensive detection.
Frequently Asked Questions
Common questions about the Entropy Analyzer
Shannon entropy measures the randomness or unpredictability of data on a scale from 0 (completely predictable) to 8 (maximum randomness). It's crucial for malware analysis because malicious software often uses packers, encryptors, or obfuscators that produce high-entropy output to evade antivirus detection, making entropy a quick first indicator of potential threats.
Malware authors use packers and encryption to evade signature-based detection by antivirus software. These techniques transform the original malicious code into encrypted or compressed data that appears random, hiding recognizable patterns and code signatures. Once executed, the malware unpacks itself in memory to run its payload.
Files with entropy above 7.0 are likely packed or encrypted, while values above 7.2 almost certainly indicate encryption or compression. Normal executable code typically ranges from 4.0 to 6.5, and values between 6.8 and 7.0 are considered suspicious and warrant further investigation.
Simply drag and drop any binary file (up to 50MB) into the upload area or click to browse for a file. The tool immediately calculates the overall Shannon entropy, analyzes 1KB sections across the file, shows byte distribution patterns, and provides an automated assessment with specific warnings and recommendations for further analysis.
No, high entropy doesn't automatically mean a file is malicious. Legitimate compressed files (ZIP, 7z), encrypted documents, multimedia files, and legally protected software often have high entropy. Always combine entropy analysis with other indicators like file metadata, digital signatures, source reputation, and behavioral analysis.
File entropy is the overall randomness score for the entire file, while section entropy breaks the file into chunks (1KB sections) and calculates entropy for each. Section analysis helps identify localized high-entropy regions that might indicate embedded encrypted payloads or packed code segments hidden within otherwise normal files.
No, entropy analysis only detects packed, encrypted, or obfuscated malware. It won't identify unpacked malware with normal entropy levels, polymorphic malware that mimics legitimate files, or scripts and macros. Entropy is best used as one component of a comprehensive malware detection strategy alongside signature-based scanning, heuristic analysis, and sandboxing.
ℹ️ Disclaimer
This tool is provided for informational and educational purposes only. All processing happens entirely in your browser - no data is sent to or stored on our servers. While we strive for accuracy, we make no warranties about the completeness or reliability of results. Use at your own discretion.