Grok Pattern Builder & Debugger
Build, test, and debug Grok patterns online with live matching, automatic fix suggestions, and export to regex, Logstash config, and Elasticsearch ingest pipelines.
Want to learn more?
Understand how grok patterns build on regular expressions, when each approach wins, and how to parse structured log data cleanly.
Read the guidePattern & Logs
You build the idea. I'll ship the product.
Productized MVP development for founders. 8 SaaS apps shipped — yours could be next, in 6 weeks. Secure by default.
What Are Grok Patterns?
Grok is the de facto standard language for parsing unstructured logs into structured, searchable data. Originally built for Logstash, it's now supported by Elasticsearch ingest pipelines, OpenSearch, Graylog, Fluentd, and most SIEM platforms.
A grok expression combines pattern names and field names: %{IP:client_ip} means "match an IP address here, and store it in a field called client_ip." Under the hood, every grok pattern expands into a regular expression — grok just makes those expressions reusable, readable, and named.
The power comes from composition. The built-in %{COMBINEDAPACHELOG} pattern expands into a regex over 400 characters long, built from smaller patterns like %{IPORHOST}, %{HTTPDATE}, and %{QUOTEDSTRING}. You write one token; grok handles the complexity.
Grok vs. Regex: When to Use Which
Grok and regular expressions are not competitors — grok is regex, with a naming and reuse layer on top.
Use grok when:
- Parsing logs into structured fields for Elasticsearch, OpenSearch, Splunk, or a SIEM
- You want maintainable patterns your team can read (
%{TIMESTAMP_ISO8601:time}vs. 60 characters of date regex) - The format is semi-structured: consistent overall shape, but variable content
Use plain regex when:
- Matching or validating strings inside application code
- You need advanced features grok doesn't expose (lookarounds, backreferences, conditionals)
- Working in a tool that has no grok support
Performance note: because grok compiles to regex, all regex performance rules apply. Anchor patterns at the start of the line, avoid %{GREEDYDATA} mid-pattern, and prefer specific patterns (%{IP}) over generic ones (%{NOTSPACE}) — failed matches on generic patterns cause expensive backtracking.
Debugging Grok Patterns Systematically
Grok failures are frustrating because the error is always the same — _grokparsefailure — with no hint about where the pattern broke. The systematic approach:
-
Start with the timestamp. Most patterns fail in the first 30 characters because the timestamp format doesn't match.
%{TIMESTAMP_ISO8601}will not matchJan 15 10:30:45(that's%{SYSLOGTIMESTAMP}). -
Build incrementally. Match the first element, append
%{GREEDYDATA:rest}, and verify. Then move one element fromrestinto your pattern at a time. The moment matching breaks, you've found the problem element. -
Watch the whitespace. A single literal space in your pattern requires exactly one space in the log. Logs aligned with multiple spaces or tabs need
\s+instead. -
Escape special characters. Square brackets, parentheses, and pipes are regex syntax. To match
[ERROR]literally, write\[%{LOGLEVEL:level}\].
This tool's debugger automates all four steps: it shows exactly where matching stopped and proposes one-click fixes for the failing element.
Frequently Asked Questions
Common questions about the Grok Pattern Builder & Debugger
A grok pattern is a named, reusable regular expression used to parse unstructured log lines into structured fields. Instead of writing raw regex like (?:[+-]?(?:[0-9]+)), you write %{INT:status_code} — the pattern name (INT) describes what to match and the field name (status_code) describes where to store it. Grok is the standard parsing language in Logstash, Elasticsearch ingest pipelines, OpenSearch, Graylog, and Fluentd.
Grok is a layer on top of regex, not a replacement. Every grok pattern compiles down to a regular expression. The differences: 1) Grok gives you 100+ pre-built, tested patterns (%{IP}, %{TIMESTAMP_ISO8601}) so you don't reinvent them. 2) Grok pairs each match with a named output field, so parsing and field mapping happen in one step. 3) Grok patterns are far more readable — %{COMBINEDAPACHELOG} vs. a 400-character regex. Use plain regex for one-off matching in code; use grok when parsing logs into structured data for a SIEM or log platform.
Work left to right: grok fails at the first non-matching element, and everything after it never gets evaluated. This tool's debugger automates that process — it matches your pattern segment by segment, shows exactly where matching stopped (green = matched, red = unmatched), and suggests replacements for the failing element. The most common causes are: timestamp format mismatches, single literal spaces where the log has multiple spaces or tabs, and unescaped special characters like [ ] ( ).
GREEDYDATA matches everything to the end of the line (regex .*). It's perfect as the last element of a pattern to capture "the rest of the message." Avoid using it in the middle of patterns — the regex engine will match to the end of the line, then backtrack character by character to satisfy the rest of your pattern. On non-matching lines this causes catastrophic backtracking that can spike Logstash CPU. Use %{DATA} (non-greedy) between known anchors instead.
Paste or build your grok pattern in this tool, then open the Export panel and choose "Regex (JavaScript)" or "Regex (PCRE)". The tool recursively expands every %{PATTERN:field} reference into its underlying regular expression with named capture groups. This is useful when you need the same parsing logic in application code, grep -P, or a tool that doesn't support grok.
Yes. Open "Custom Patterns" below the pattern input and define them one per line, exactly like a Logstash patterns_dir file: ORDERID ORD-[0-9]{6}. You can then reference %{ORDERID:order_id} in your main pattern. The Logstash and ingest pipeline exports automatically include your custom definitions in the generated config.
For the standard combined format, use the built-in %{COMBINEDAPACHELOG} pattern (works for both Apache and nginx default formats). It extracts clientip, timestamp, verb, request, response, bytes, referrer, and agent fields. This tool includes presets for nginx, Apache, HAProxy, and IIS — click one to load the pattern with a sample log line.
Start with %{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:hostname} %{PROG:program}(?:[%{POSINT:pid:int}])?: %{GREEDYDATA:message} for traditional RFC 3164 syslog. This extracts the timestamp, host, program name, optional PID, and message. For specific applications (sshd, CRON, kernel), parse the message field with a second grok pattern. The Syslog and SSH presets in this tool give you working starting points.
Dissect splits strings by fixed delimiters with no regex involved, making it roughly 4x faster than grok. Use dissect when your log format is rigid (every line has identical structure, like CSV or tab-separated). Use grok when the format varies — optional fields, variable whitespace, or different message types in the same stream. A common production pattern is dissect first for the fixed prefix (timestamp, host), then grok only the variable message part.
Add :int or :float as a third component: %{NUMBER:response_time:float} or %{INT:status_code:int}. Without this, every captured field is a string — which means your log platform can't do range queries, sums, or averages on it. This matters for response times, byte counts, and status codes you'll want to aggregate in dashboards.
Explore More Tools
Continue with these related tools