Want to learn more?
Learn techniques for extracting indicators of compromise from security reports and threat intelligence.
Read the guideThreat Intel Overwhelming Your Team?
Our SOC analysts correlate IOCs, enrich indicators, and hunt for threats across your environment 24/7.
What Is an IOC Extractor
An IOC (Indicator of Compromise) extractor automatically identifies and extracts security-relevant artifacts from unstructured text such as threat intelligence reports, email headers, log files, and incident notes. IOCs include IP addresses, domain names, URLs, file hashes, email addresses, CVE identifiers, and other observable data that indicate malicious activity or compromise.
Security analysts spend significant time manually copying IOCs from PDF reports, threat advisories, and internal communications. An automated extractor uses pattern matching and validation to pull these indicators in seconds, reducing manual effort and ensuring no critical indicators are missed. Extracted IOCs can then be fed into SIEMs, firewalls, threat intelligence platforms, and blocklists for automated detection and response.
How IOC Extraction Works
IOC extractors use regular expressions and validation logic to identify specific patterns in text:
| IOC Type | Pattern | Example |
|---|---|---|
| IPv4 address | Dotted decimal notation | 192.168.1.100 |
| IPv6 address | Colon-separated hexadecimal | 2001:db8::1 |
| Domain | Hostname with TLD | malware.evil.com |
| URL | Full URI with scheme | https://evil.com/payload.exe |
| MD5 hash | 32 hex characters | d41d8cd98f00b204e9800998ecf8427e |
| SHA-1 hash | 40 hex characters | da39a3ee5e6b4b0d3255bfef95601890afd80709 |
| SHA-256 hash | 64 hex characters | e3b0c44298fc1c149afbf4c8996fb924... |
| Email address | user@domain format | [email protected] |
| CVE ID | CVE-YYYY-NNNNN | CVE-2024-12345 |
| MITRE ATT&CK | Tactic/technique IDs | T1059.001 |
Defanged IOC handling: Threat reports often "defang" IOCs to prevent accidental clicks—writing hxxps://evil[.]com instead of https://evil.com. Quality extractors recognize and automatically refang these patterns for direct use in security tools.
Common Use Cases
- Threat intelligence processing: Extract IOCs from vendor advisories, ISAC bulletins, and OSINT reports
- Incident response: Pull indicators from malware analysis reports and forensic timelines for hunting
- SIEM enrichment: Feed extracted IOCs into detection rules and watchlists
- Blocklist generation: Convert threat reports into actionable IP, domain, and URL blocklists
- Threat hunting: Use extracted hashes and domains to search across historical logs for unreported compromise
Best Practices
- Validate extracted IOCs — Not every IP address in a document is malicious; cross-reference with context and threat feeds
- Handle defanged formats — Support common defanging patterns like [.], hxxp, and {at} for reliable extraction
- Deduplicate results — Reports often mention the same IOC multiple times; deduplicate before importing into tools
- Preserve context — Record where each IOC was found and what threat it relates to for analyst context
- Automate the pipeline — Connect extraction to your TIP (Threat Intelligence Platform) for automatic ingestion and correlation
References & Citations
- OASIS Open. (2024). STIX - Structured Threat Information Expression. Retrieved from https://oasis-open.github.io/cti-documentation/stix/intro (accessed January 2025)
- CISA. (2024). Traffic Light Protocol (TLP). Retrieved from https://www.cisa.gov/news-events/news/traffic-light-protocol-tlp-definitions-and-usage (accessed January 2025)
- MITRE ATT&CK. (2024). Indicators of Compromise (IOC). Retrieved from https://attack.mitre.org/ (accessed January 2025)
Note: These citations are provided for informational and educational purposes. Always verify information with the original sources and consult with qualified professionals for specific advice related to your situation.
Key Security Terms
Understand the essential concepts behind this tool
Frequently Asked Questions
Common questions about the IOC Extractor
Indicators of Compromise (IOCs) are forensic artifacts indicating potential security breach. Types: IP addresses (C2 servers), domains (phishing sites), URLs (malware downloads), file hashes (malware samples), email addresses (attackers), file paths, registry keys, mutexes. Used in: threat intelligence sharing (STIX/TAXII), SIEM rules, IDS/IPS signatures, threat hunting. Extract IOCs from: security logs, incident reports, malware analysis, threat feeds.
Use regex patterns or specialized tools to identify IOCs in unstructured text. Common patterns: IPv4 (192.0.2.1), IPv6, domains (example.com), URLs, MD5/SHA hashes, email addresses. Challenges: defanged IOCs (hxxp://example[.]com), false positives (version numbers as IPs), context. Our tool: automatically detects patterns, handles defanged format, removes duplicates, exports to CSV/JSON. Use for: parsing threat reports, analyzing logs, enriching SIEM data.
Defanged IOCs are intentionally modified to prevent accidental clicks or DNS lookups. Common modifications: hxxp:// (instead of http://), example[.]com (brackets), 192[.]0[.]2[.]1 (brackets), @ replaced with [at]. Used in: threat reports, email communication, documentation. Prevents: accidental malware execution, DNS queries to C2 servers, analyst mistakes. Refanging: convert back to original format for analysis. Our tool automatically detects and refangs IOCs.
Validation prevents false positives. Steps: 1) Check format (valid IP ranges, domain TLDs, hash lengths). 2) Remove private IPs (10.x, 172.16-31.x, 192.168.x). 3) Exclude loopback (127.x). 4) Verify hash algorithms (MD5=32 chars, SHA-1=40, SHA-256=64). 5) Check domain reputation (VirusTotal, AbuseIPDB). 6) Context analysis (log timestamps, related IOCs). 7) Remove CDN/legitimate services (cloudflare, google). Use threat intelligence platforms for enrichment.
Common formats: IPv4 (192.0.2.1), IPv6 (2001:db8::1), domains (example.com, sub.example.co.uk), URLs (https://example.com/path), MD5 hashes (32 hex), SHA-1 (40 hex), SHA-256 (64 hex), email addresses ([email protected]), CVE IDs (CVE-2024-1234). Defanged variants: hxxp://, example[.]com, 192[.]0[.]2[.]1. Output: CSV (spreadsheet analysis), JSON (SIEM integration), STIX (threat sharing), OpenIOC (standardized). Our tool auto-detects all common formats.
IOC-based hunting workflow: 1) Collect IOCs from threat intel feeds, reports, sandboxes. 2) Enrich with context (malware family, campaign, TTP). 3) Search SIEM/EDR logs for matches (firewall blocks, DNS queries, file hashes). 4) Investigate matches (timeline analysis, lateral movement, data exfiltration). 5) Expand IOCs (pivot to related artifacts). 6) Update detection rules. Tools: Splunk, ELK, Sentinel, CrowdStrike. Limitation: IOC-based detection misses zero-days.
False positives occur when legitimate indicators misidentified as malicious. Common causes: public DNS servers (8.8.8.8), CDN IPs (Cloudflare, Akamai), popular domains (google.com in logs), version numbers as IPs (10.0.1.2 in software), test/example domains. Reduce false positives: whitelist known-good IOCs, check reputation scores, require multiple IOC matches, add context (user behavior, timeline), validate with threat intel. Balance sensitivity vs accuracy.
Share IOCs using standardized formats: STIX/TAXII (structured threat intelligence), MISP (sharing platform), OpenIOC (open format), CSV/JSON (simple). Best practices: 1) Defang before sharing (prevent accidental access). 2) Include context (malware family, confidence score, source). 3) Use TLP (Traffic Light Protocol) for sensitivity: TLP:CLEAR (public), TLP:GREEN (community), TLP:AMBER (limited), TLP:RED (eyes only). 4) Anonymize victim data. 5) Verify accuracy before sharing.
ℹ️ Disclaimer
This tool is provided for informational and educational purposes only. All processing happens entirely in your browser - no data is sent to or stored on our servers. While we strive for accuracy, we make no warranties about the completeness or reliability of results. Use at your own discretion.