Filter analyzed files by type or mismatch status
Verifying File Types in Uploads?
Our security team implements file validation, malware scanning, and secure upload handling.
What Is a File Magic Number
A file magic number (also called a file signature) is a sequence of bytes at the beginning of a file that identifies its format. Unlike file extensions (which are part of the filename and easily changed), magic numbers are embedded in the file's binary content and reliably indicate the actual file type regardless of what extension is used.
Magic numbers are critical for security because attackers frequently disguise malicious files by changing their extensions — renaming a .exe to .pdf, for example. File upload validators, antivirus scanners, and forensic tools use magic number checks to determine the true file type and detect such deception.
How Magic Numbers Work
The first few bytes of a file contain a signature that file identification tools compare against a database of known formats:
| File Type | Magic Bytes (Hex) | ASCII Representation | Position |
|---|---|---|---|
| 25 50 44 46 | Offset 0 | ||
| PNG | 89 50 4E 47 0D 0A 1A 0A | .PNG.... | Offset 0 |
| JPEG | FF D8 FF | ... | Offset 0 |
| ZIP/DOCX/XLSX | 50 4B 03 04 | PK.. | Offset 0 |
| ELF (Linux executable) | 7F 45 4C 46 | .ELF | Offset 0 |
| PE (Windows executable) | 4D 5A | MZ | Offset 0 |
| GIF | 47 49 46 38 | GIF8 | Offset 0 |
| SQLite | 53 51 4C 69 74 65 | SQLite | Offset 0 |
| Java .class | CA FE BA BE | .... | Offset 0 |
| gzip | 1F 8B | .. | Offset 0 |
The Unix file command, Python's python-magic library, and this tool all use magic number databases to identify files. The most comprehensive database is maintained by the libmagic project.
Common Use Cases
- Upload validation: Verify that uploaded files match their claimed type before processing. A file with a .jpg extension but PE (MZ) magic bytes is likely a disguised executable.
- Forensic analysis: Identify file types on seized storage media, especially when files have been renamed or have no extension
- Malware analysis: Detect files disguised with incorrect extensions, a common technique in malware distribution and social engineering
- Data loss prevention: Scan outbound files to ensure employees are not exfiltrating sensitive data disguised as innocuous file types
- Content filtering: Web application firewalls and proxy servers use magic number checks to enforce upload and download policies
Best Practices
- Never trust file extensions alone — Always validate the magic number in addition to the extension. Extensions are metadata that users and attackers can change freely.
- Check magic numbers server-side — Client-side extension checks are trivially bypassed. Perform magic number validation on the server before processing any uploaded file.
- Validate deep structure, not just headers — Some polyglot files contain valid magic numbers for multiple formats simultaneously. For high-security applications, parse the file structure beyond just the initial bytes.
- Whitelist allowed file types — Rather than trying to detect all malicious types, maintain a whitelist of permitted magic numbers and reject everything else.
- Combine with antivirus scanning — Magic number checks confirm file type but do not detect malicious content within valid files. Always complement with content scanning for defense in depth.
References & Citations
- Gary Kessler. (2024). List of File Signatures (Magic Numbers). Retrieved from https://www.garykessler.net/library/file_sigs.html (accessed January 2025)
- DigitalPreservation.gov. (2024). File Format Specifications. Retrieved from https://www.digitalpreservation.gov/formats/ (accessed January 2025)
- NIST. (2024). Computer Forensics Tool Testing Program - Forensic File Carving. Retrieved from https://www.nist.gov/itl/ssd/software-quality-group/computer-forensics-tool-testing-program-cftt/cftt-technical-0 (accessed January 2025)
Note: These citations are provided for informational and educational purposes. Always verify information with the original sources and consult with qualified professionals for specific advice related to your situation.
Frequently Asked Questions
Common questions about the File Magic Number Checker
File magic numbers (file signatures) are byte sequences at the beginning of files that identify file types: Definition: Fixed byte pattern at start of file (typically first 2-16 bytes), used by operating systems to determine file type, independent of file extension. Common magic numbers: (1) JPEG: FF D8 FF (hex), starts every JPEG image. (2) PNG: 89 50 4E 47 0D 0A 1A 0A (hex) or ".PNG" in ASCII. (3) PDF: 25 50 44 46 (hex) or "%PDF" in ASCII. (4) ZIP: 50 4B 03 04 (hex) or "PK" in ASCII. (5) EXE (Windows): 4D 5A (hex) or "MZ" in ASCII. (6) ELF (Linux): 7F 45 4C 46 (hex). Why important: (1) Detect file extension spoofing - Malware disguised as safe file (malware.exe renamed to document.pdf), real type revealed by magic number. (2) Security analysis - Email attachments claiming to be images but are executables, identify hidden file types in forensic analysis. (3) Data recovery - Recover files with corrupted/missing extensions, identify fragments from unallocated disk space. (4) Malware detection - Polyglot files (valid multiple file types), steganography (data hidden in images), obfuscation techniques. (5) Compliance verification - Ensure uploaded files match allowed types, prevent policy violations (uploading executables to document portal). How it works: (1) Read first N bytes of file (header), (2) Compare against database of known signatures, (3) Identify file type regardless of extension. Tools: Unix file command, TrID (File Identifier), this magic number checker, hex editors (HxD, 010 Editor). Real-world example: Email attachment "invoice.pdf" has magic number 4D 5A = Windows executable, victim opens "PDF" and runs malware. File extensions lie, magic numbers don't (unless deliberately crafted).
Extension spoofing exploits user trust in file extensions: Attack technique 1: Double extension - malware.pdf.exe (Windows hides .exe), user sees malware.pdf and thinks it's safe, icon shows PDF icon (can be customized), clicking executes malware. Attack technique 2: Right-to-left override - Unicode character U+202E reverses text display, filename: resume[U+202E]fdp.exe displays as: resume[exe.pdf backward] = resumeexe.pdf, actual file: resume.exe (PDF part is just display trick). Attack technique 3: Renamed executables - malware.exe → document.pdf, if email filters only check extension (not magic number), email delivered as "safe" PDF, user opens with default PDF viewer → error, user "tries again" by running with different program → executes. Attack technique 4: Archive containing executables - compressed_docs.zip contains: report.pdf (legitimate), setup.exe (malware), users extract all files, unknowingly run setup.exe. Attack technique 5: Polyglot files - File that is valid in multiple formats, example: file is both valid JPEG and ZIP, displayed as image in preview, but can be extracted as ZIP containing malware. Detection methods: (1) Check magic numbers - Read first bytes to identify real type, compare with file extension. (2) Deep file inspection - Scan entire file structure (not just header), detect embedded executables, identify suspicious sections. (3) Behavior analysis - Sandbox execution to observe behavior, detect payload extraction/execution. Email security: Modern mail gateways check: magic numbers vs extension, double extensions, RLO characters, macros in Office documents. User protection: (1) Show file extensions (Windows: unhide file extensions), (2) Hover over files to see full path, (3) Check file properties (right-click → Properties → Details), (4) Verify sender before opening attachments, (5) Use antivirus with heuristic detection. Statistics: 45% of malware uses extension spoofing, double extension attacks increased 300% in 2023, most effective against non-technical users. This tool helps verify true file type by examining magic numbers.
Polyglot files are valid files in multiple formats simultaneously: Definition: Single file that is syntactically valid in two or more file formats, parsers for different formats interpret same bytes differently, exploits format ambiguities and error handling. Example: JPEG/ZIP polyglot - File header: FF D8 FF E0 (JPEG), followed by JPEG data, then ZIP data appended (ZIP allows prepended data), ZIP footer at end. Behavior: Image viewer shows JPEG image, ZIP tool extracts files from ZIP section. Common polyglot combinations: (1) GIF/JS - Valid GIF image that is also valid JavaScript, used to bypass upload filters, execute JS payload in browser. (2) PDF/PostScript - PDF files can contain PostScript, exploit PDF readers with PostScript support. (3) HTML/Image - HTML tags hidden in image metadata, XSS attacks when "image" rendered in browser. (4) JAR/ZIP - Java archive is also valid ZIP, can contain multiple executables. (5) Office/HTML - Word .docx is really a ZIP, can embed HTML/scripts inside. Security risks: (1) Bypass security filters - Upload filter checks for image magic number (passes), but file contains hidden executable code. (2) XSS attacks - Upload "image" that browsers parse as HTML, execute malicious scripts on victim domain. (3) Data exfiltration - Hide sensitive data in legitimate-looking files, steganography combined with polyglot techniques. (4) Malware delivery - Display benign content (image/document), extract payload when opened with different tool. Real-world attacks: (1) ImageTragick (CVE-2016-3714) - ImageMagick vulnerability processing polyglot files, arbitrary code execution. (2) Office macros - Polyglot Office documents evade detection, macros execute when opened. (3) ZIP bombs in images - Image file is also ZIP containing compressed bomb, causes DoS when extracted. Detection challenges: (1) File validators only check one format, (2) Hard to detect all valid format combinations, (3) False positives (legitimate files with metadata), (4) Requires deep content inspection. Defense strategies: (1) Validate entire file structure (not just magic number), (2) Re-encode files (breaks polyglot structure), (3) Strip metadata from uploads, (4) Sandbox execution before allowing download, (5) Content Security Policy (CSP) to prevent script execution. For forensics: Polyglot analysis requires: hex editor to view full file structure, multiple file format parsers, understanding of file format specifications. This tool helps identify multiple valid formats in single file.
Comprehensive file identification techniques for digital forensics: Method 1: Magic Number Analysis - Read first 16-32 bytes (common signature length), compare against signature databases, tools: file command (Linux), TrID, this checker. Example workflow: xxd suspicious_file | head (view hex), identify signature (4D 5A = EXE, FF D8 FF = JPEG), verify with known signatures. Method 2: Header-Footer Analysis - Some files have both header and footer signatures, JPEG: starts FF D8 FF, ends FF D9, PDF: starts %PDF, ends %%EOF. Validation: Check both header and footer match expected format, detect truncated or corrupted files. Method 3: Entropy Analysis - Measure randomness of file contents, high entropy (7.5-8.0) = encrypted/compressed, medium entropy (5-7) = text/code, low entropy (<5) = repetitive data. Uses: Identify encrypted files (ransomware), detect packed executables, find compressed archives. Method 4: String Analysis - Extract ASCII/Unicode strings from binary files, reveal: file paths embedded in malware, URLs/IPs for C2 communication, debug messages, copyright notices. Tools: strings command, Sysinternals Strings, FLOSS (FLARE Obfuscated String Solver). Method 5: Metadata Examination - EXIF data (images): camera info, GPS location, timestamps, Office documents: author, creation/modification dates, revision count, PDFs: creator software, embedded objects. Tools: ExifTool, pdfinfo, MediaInfo. Method 6: File Carving - Recover files from unallocated disk space, search for magic numbers in raw disk image, extract data between header and footer, reconstruct deleted files. Tools: Foremost, Scalpel, PhotoRec. Method 7: Deep File Structure Analysis - Parse complete file format (not just signature), verify structural integrity, detect embedded files or anomalies. Example: ZIP analysis: Verify central directory matches local headers, check for hidden files (in gaps between entries), detect malicious ZIP structures (zip bombs, overlapping entries). Common forensic scenarios: (1) Malware analysis: Identify packed executables (UPX, ASPack), detect code injection (PE file anomalies), analyze shellcode (no standard magic number). (2) Data recovery: Identify file fragments, reconstruct partially overwritten files, determine file type when extension missing. (3) E-discovery: Validate file integrity, identify duplicates via hash + type, detect renamed files to hide content. (4) Incident response: Identify malicious files in memory dumps, analyze network captures for file transfers, detect lateral movement artifacts. Best practices: (1) Hash files before analysis (preserve evidence), (2) Work on forensic copies (not original media), (3) Document all analysis steps, (4) Use multiple tools to verify findings, (5) Maintain chain of custody. This tool provides quick magic number identification for first-pass analysis.
Magic numbers and MIME types serve different purposes: Magic Numbers - Byte sequence at beginning of file, embedded in file content itself, determined by file format specification, independent of file naming or metadata, example: JPEG always starts with FF D8 FF. MIME Types - Text label describing file type, transmitted in HTTP headers or email metadata, not part of file content itself, can be set arbitrarily (not enforced), example: Content-Type: image/jpeg. Key differences: (1) Location: Magic numbers: inside file, MIME types: in metadata/headers. (2) Reliability: Magic numbers: hard to fake (would corrupt file), MIME types: easily spoofed. (3) Purpose: Magic numbers: file format identification, MIME types: network communication hint. (4) Authority: Magic numbers: defined by file format creator, MIME types: registered with IANA. Trust comparison: Magic number: Trust HIGH (part of file structure), MIME type: Trust LOW (can be arbitrary). Common MIME types: text/html, text/plain, image/jpeg, image/png, application/pdf, application/zip, application/json, video/mp4, audio/mpeg. Security implications: Attack scenario: Attacker sends: file: malware.exe (magic: 4D 5A), Content-Type: image/jpeg (MIME type), victim's browser checks MIME type (not magic number), browser attempts to render as JPEG → fails or executes (depends on browser). Defense: Content sniffing - Browsers perform content sniffing: examine file content (magic numbers), compare with declared MIME type, block if mismatch (in modern browsers). X-Content-Type-Options: nosniff - HTTP header prevents content sniffing, forces browser to trust declared MIME type, security trade-off: prevents polyglot attacks but can cause display issues. Best practices: (1) Server-side: Always validate file content (magic numbers), set correct MIME type based on content analysis (not user input), use Content-Disposition: attachment for downloads. (2) Client-side: Don't trust MIME types from untrusted sources, verify file content before processing, implement Content Security Policy. File upload validation: ❌ INSECURE: Check only file extension or MIME type. ✅ SECURE: Check magic number, validate entire file structure, re-encode/sanitize file, store with random filename, serve from separate domain. Relationship: Ideally: magic number and MIME type agree (file is what it claims), Reality: must verify both to detect attacks. This tool focuses on magic number analysis for accurate file identification.
Techniques to identify hidden data within files: Steganography basics: Hide data within other data (carrier file), preserve carrier file's functionality, detection is challenging (security through obscurity). Common techniques: (1) LSB (Least Significant Bit) modification - Modify least significant bits of image pixels, changes imperceptible to human eye, can hide ~1/8 of image size in data. (2) Metadata hiding - Embed data in EXIF, IPTC, XMP metadata, comments fields in various formats, header/footer padding areas. (3) Polyglot files - Combine multiple file formats, hidden data in "unused" sections. (4) File append - Append data after file footer, JPEG/GIF allow trailing data, ZIP files can have prepended data. Detection methods: Method 1: Visual/Statistical Analysis - Compare to original (if available), look for visual artifacts (unusual noise patterns), check file size vs expected (is file larger than typical?), analyze color histogram (anomalies indicate modification). Tools: StegDetect, StegExpose, ImageJ (statistical analysis). Method 2: Entropy Analysis - Calculate entropy per region/layer, natural images: varied entropy, steganography: more uniform entropy (hidden data has different randomness). Example: ent filename shows entropy score, pure random data = 8.0 bits/byte, English text = ~4.5 bits/byte. Method 3: LSB Analysis - Extract LSB plane from image, visualize LSB layer (hidden data appears as patterns), statistical tests (chi-square test for randomness). Tools: zsteg (Ruby), stegdetect, StegSpy. Method 4: Metadata Examination - Extract all metadata fields: exiftool -a -G1 -s file.jpg, check comment fields, EXIF UserComment, PDF metadata, look for suspicious hex strings, base64-encoded data. Method 5: File Structure Analysis - Parse file format completely, identify trailer data after EOF marker, check for gaps/padding with hidden data, verify structural integrity. Example: JPEG analysis - JPEG ends with FF D9 marker, any data after FF D9 is suspicious, extract: dd if=image.jpg of=trailer.bin skip=<offset>. Method 6: Comparison with Known-Good - Compare with original file (if available), diff hex dumps to find modified bytes, identify specific modification technique. Specialized tools: (1) Steghide - Detect/extract steghide-embedded data. (2) OutGuess - Statistical steganalysis. (3) StegSuite - Multiple detection algorithms. (4) Forensic tools - FTK, EnCase have stego detection. Indicators of steganography: File size larger than expected, modified LSB patterns, metadata anomalies (unusual timestamps, empty required fields filled), trailing data after EOF, high entropy in "noise" areas. Extraction attempts: Try common tools with/without passwords: steghide extract -sf file.jpg, outguess -r file.jpg output.txt, stegdetect file.jpg. Check for: ZIP archives (many stego tools hide ZIPs), text files, encrypted containers. For forensics: Document original file hash, extract suspicious regions for analysis, attempt multiple extraction tools, analyze network traffic for stego patterns. Limitations: Modern stego algorithms are hard to detect, requires statistical analysis and pattern matching, false positives common with compressed/encrypted content. This magic number tool helps identify file format as first step in stego analysis.
File carving recovers files from raw data without filesystem metadata: When used: (1) Deleted file recovery - Files deleted (not in filesystem directory), data remains in unallocated space until overwritten. (2) Damaged filesystems - Corrupted filesystem structures (MFT, inodes), raw disk access still possible. (3) Memory forensics - Recover files from RAM dumps, identify loaded executables, documents in memory. (4) Network forensics - Extract files from network capture (PCAP), recover email attachments, identify malware downloads. (5) Anti-forensics response - Attacker deleted logs/evidence, wiped filesystem metadata. Carving process: (1) Signature-based carving: Scan for magic numbers (file headers), scan for footers (file endings), extract data between header and footer. Example: Search raw disk for: JPEG header (FF D8 FF), scan forward for JPEG footer (FF D9), extract all bytes between = recovered JPEG. (2) Validation: Check extracted file integrity, verify file structure is valid, test if file opens correctly. (3) Fragment reassembly: Deal with fragmented files (not contiguous), use gap-carving techniques, maximum fragment size limits. Carving tools: (1) Foremost - Fast, signature-based, config file defines headers/footers, usage: foremost -i disk.img -o output/. (2) Scalpel - Improved Foremost, better performance, more flexible configuration. (3) PhotoRec - Recovers photos, documents, archives, works on any filesystem, can recover from damaged media. (4) Bulk_extractor - Feature extraction + carving, finds credit cards, emails, URLs, doesn't mount filesystem. (5) Custom scripts - Python/Perl with regex for magic numbers, automated extraction pipelines. Advanced techniques: (1) Gap carving - Recover fragmented files with gaps, use maximum cluster size as limit, reassemble fragments. (2) Smart carving - Use file format knowledge, validate internal structure, recover based on metadata consistency. (3) Bifragment carving - File split into exactly 2 fragments, try all possible combinations. Challenges: (1) Fragmentation: Files split across disk, fragments not contiguous, impossible to fully recover without filesystem data. (2) Compression/Encryption: Can't identify compressed data by magic number (ZIP might be found, but not contents), encrypted data appears random. (3) False positives: Magic number patterns occur randomly, not all matches are real files, need validation. (4) Overwritten data: Once overwritten, data unrecoverable, even partial overwrite corrupts file. File format considerations: Easy to carve: JPEG (clear header/footer), PNG (clear signatures), PDF (text-based structure), GIF (simple format). Hard to carve: Fragmented videos (no clear footer), Compressed archives (nested files), Databases (complex structure), Encrypted containers. Memory carving specifics: Process memory dumps for: loaded executables (PE/ELF headers), documents in memory (Office, PDF), screenshots in graphics memory, extracted malware payloads. Best practices: (1) Work on forensic image (never original media), (2) Hash recovered files, (3) Document carving parameters used, (4) Validate recovered files, (5) Use multiple tools (different algorithms). This magic number checker identifies signatures for carving configuration files.
Building custom signature databases for specialized file identification: Why custom databases: (1) Detect proprietary file formats, identify malware-specific signatures, find organization-specific file types, analyze embedded/custom protocols, handle format variations. Components of signature entry: (1) Magic number - Byte sequence (hex), offset (usually 0, but can vary), example: 4D 5A at offset 0 for Windows EXE. (2) File extension - Associated extension(s), can be multiple (.jpg, .jpeg, .jpe). (3) MIME type - Corresponding MIME type, example: image/jpeg. (4) Description - Human-readable name, example: "JPEG image data". (5) Additional signatures - Secondary signatures for validation, footer markers, internal structure patterns. Database formats: (1) TrID XML - Open format for TrID tool, flexible signature definition, supports multiple patterns per format. (2) file magic database - Used by Unix file command, compiled format (more complex), located in /usr/share/misc/magic. (3) YARA rules - Powerful pattern matching, supports complex conditions, used for malware detection. (4) Custom JSON/XML - Self-defined schema, easy to parse and modify, portable across tools. Creating signature entries: Step 1: Collect samples - Gather multiple samples of target file type, ensure samples are valid and representative, minimum 10-20 samples for accuracy. Step 2: Identify common patterns - Hex dump each file: xxd file1.ext | head -n 5, identify consistent byte patterns, note offset and length of pattern. Example analysis: File1: 50 4B 03 04 14 00 08 00..., File2: 50 4B 03 04 14 00 06 00..., File3: 50 4B 03 04 14 00 08 00..., Common: 50 4B 03 04 at offset 0 (all ZIP-based formats). Step 3: Define specificity - Generic signature: 50 4B 03 04 (all ZIP-based), specific signature: 50 4B 03 04 + internal file name pattern (e.g., DOCX has word/ directory). Step 4: Test against false positives - Run signature against large file corpus, measure false positive rate, refine signature for accuracy. Example: YARA rule for custom format - \nrule custom_format {\n meta:\n description = "Custom Application Format"\n author = "Security Team"\n strings:\n $magic = { 43 55 53 54 4F 4D } // "CUSTOM" in hex\n $version = { 01 00 ?? ?? } // Version 1.0.x.x\n condition:\n $magic at 0 and $version at 6\n}\n Step 5: Document format - Create specification document, include: offset, pattern, description, variations, known false positives. Advanced techniques: (1) Multi-byte patterns - Combine multiple signature locations, example: header + footer + internal structure. (2) Wildcards - Allow variable bytes: 50 4B ?? ?? (any 2 bytes), useful for version variations. (3) Regular expressions - Match complex patterns, useful for text-based formats. (4) Composite signatures - Logical combinations (AND, OR, NOT), detect variants of same format. Integration with tools: (1) TrID: Create TrID XML definition, place in TrID defs folder, automatic detection. (2) file command: Edit /etc/magic or ~/.magic, recompile magic database. (3) YARA: Save rules as .yar files, scan: yara rules.yar target_file. (4) Custom tools: Parse database and implement matching, optimize for performance. Maintenance: Regularly update with new variants, remove obsolete entries, validate against real-world corpus, share with community (contribute to public databases). Use cases: (1) Malware families (specific packer signatures), (2) Corporate file formats (internal tools), (3) Forensic analysis (rare formats), (4) Legacy system files (obsolete formats). This tool can be extended with custom signature databases for organization-specific needs.
⚠️ Security Notice
This tool is provided for educational and authorized security testing purposes only. Always ensure you have proper authorization before testing any systems or networks you do not own. Unauthorized access or security testing may be illegal in your jurisdiction. All processing happens client-side in your browser - no data is sent to our servers.