Regular expressions (regex) are patterns that describe sets of strings. They're one of the most powerful tools for searching, matching, and manipulating text. This tutorial takes you from basic syntax to practical patterns you can use immediately.
Basic Syntax
Literal Characters
Most characters match themselves. The regex cat matches the string "cat" exactly.
Metacharacters
Special characters have meaning beyond their literal value:
| Character | Meaning |
|---|---|
. | Any single character |
^ | Start of string |
$ | End of string |
* | Zero or more of preceding |
+ | One or more of preceding |
? | Zero or one of preceding |
\ | Escape special character |
To match a literal metacharacter, escape it with a backslash: \. matches a period.
Character Classes
Square brackets define a set of characters to match:
[abc] - matches a, b, or c
[a-z] - matches any lowercase letter
[A-Z] - matches any uppercase letter
[0-9] - matches any digit
[a-zA-Z] - matches any letter
[^abc] - matches anything except a, b, or c
Shorthand Classes
Common character classes have shortcuts:
\d - digit [0-9]
\D - non-digit [^0-9]
\w - word character [a-zA-Z0-9_]
\W - non-word character
\s - whitespace (space, tab, newline)
\S - non-whitespace
Quantifiers
Quantifiers specify how many times a pattern should match:
a* - zero or more a's
a+ - one or more a's
a? - zero or one a
a{3} - exactly 3 a's
a{2,4} - 2 to 4 a's
a{2,} - 2 or more a's
Greedy vs Lazy
By default, quantifiers are greedy---they match as much as possible. Add ? to make them lazy:
".*" - greedy: matches "hello" and "world" in "hello" and "world"
".*?" - lazy: matches "hello" then "world" separately
Groups and Capturing
Parentheses create groups:
(abc)+ - one or more "abc" sequences
(cat|dog) - matches "cat" or "dog"
(\d{3})-(\d{4}) - captures area code and number separately
Groups are numbered starting at 1. Use \1, \2, etc. to reference captured groups:
(\w+)\s+\1 - matches repeated words like "the the"
Non-Capturing Groups
Use (?:...) when you need grouping without capturing:
(?:https?://)?(www\.)?example\.com
Anchors and Boundaries
^ - start of string
$ - end of string
\b - word boundary
\B - non-word boundary
Examples:
^Hello - string starts with "Hello"
world$ - string ends with "world"
\bcat\b - "cat" as a whole word (not "category")
Lookahead and Lookbehind
Assert something exists (or doesn't) without including it in the match:
foo(?=bar) - "foo" followed by "bar" (matches "foo" only)
foo(?!bar) - "foo" not followed by "bar"
(?<=foo)bar - "bar" preceded by "foo" (matches "bar" only)
(?<!foo)bar - "bar" not preceded by "foo"
Practical Examples
Email Validation
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This matches standard email formats. For production, consider using a library.
Phone Numbers
^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$
Matches: (555) 123-4567, 555-123-4567, 555.123.4567, 5551234567
URLs
https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/[^\s]*)?
Matches HTTP and HTTPS URLs with optional paths.
IP Addresses
\b(?:\d{1,3}\.){3}\d{1,3}\b
Basic IPv4 pattern. For strict validation, check each octet is 0-255.
Passwords (Complexity Check)
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Requires lowercase, uppercase, digit, special character, and 8+ length.
Common Mistakes
Escaping issues: Remember to escape special characters. In many languages, you need double backslashes: "\\d" instead of "\d".
Greedy matching: <.*> on <div>content</div> matches the entire string, not just <div>. Use <.*?> for the lazy version.
Anchoring: Without ^ and $, patterns match anywhere in the string. \d{3} matches "123" in "abc123xyz".
Character class misuse: Inside [], most metacharacters are literal. [.] matches a period, not any character.
Testing Your Patterns
Use our Regex Tester to:
- Write and test patterns interactively
- See matches highlighted in real-time
- View captured groups
- Get explanations of your pattern
Testing patterns before using them in code catches errors early and helps you understand exactly what your regex matches.
Quick Reference
. any character
^ start of string
$ end of string
\d digit
\w word character
\s whitespace
[abc] character class
[^abc] negated class
a* zero or more
a+ one or more
a? optional
a{n} exactly n
a{n,m} n to m times
(...) capturing group
(?:...) non-capturing group
a|b alternation
\b word boundary
(?=...) positive lookahead
(?!...) negative lookahead
Regular expressions are powerful but can be complex. Start simple, test thoroughly, and build up complexity as needed.