Regex Cheat Sheet: Syntax, Examples & Engine Support

Regular Expression Quick Reference Cheat Sheet

A quick start regex cheat sheet reference guide for regular expressions, including regex syntax, symbols, ranges, grouping, assertions, Unicode handling, and some practical examples.

Character Classes

.Any character except newline. With 's', includes newlines

\wWord character (letters, digits, underscore)

\WNon-word character (inverse of \w)

\dDigit (0–9)

\DNon-digit (inverse of \d)

\sWhitespace (spaces, tabs, newlines, etc.)

\SNon-whitespace (inverse of \s)

Check Unicode modes for differences in \w, \d, etc. POSIX does not support these shorthands.

Anchors and Boundaries

^Start of string, or start of line in multi-line mode

$End of string, or end of line in multi-line mode

\AStart of string (not affected by multi-line mode)

\zEnd of string (strict match)

\ZEnd of string, ignoring trailing newline

\GStart of match or end of previous match

\bWord boundary

\BNot a word boundary

\<Start of word (GNU/POSIX extension)

\>End of word (GNU/POSIX extension)

Quantifiers

*0 or more occurrences

+1 or more occurrences

?0 or 1 occurrences

{n}Exactly n occurrences

{n,}n or more occurrences

{n,m}Between n and m occurrences

?Makes quantifiers lazy (e.g., .+?, .*?)

Flags and Modifiers

gGlobal / find-all. A JavaScript regex flag; other engines do find-all through an API call (e.g. Python findall/finditer), not a portable flag.

mMulti-line mode

iCase-insensitive matching

xIgnore whitespace (verbose mode)

sDot matches newline

uUnicode mode

XEnable additional syntax features (PCRE-specific)

UUngreedy matching (inverts greediness)

AAnchor match to the start of the string

JAllow duplicate group names

nDisable capturing groups

xxIgnore all whitespace and comments (PCRE extended)

Special Characters

\nNew line (LF)

\rCarriage return (CR)

\tTab character

\vVertical tab

\fForm feed

\aBell character

\eEscape character

\hHorizontal whitespace character

\HNon-horizontal whitespace character

\uFFFFUnicode character by 4-digit hex code

\x{FFFF}Unicode character by variable-length hex code

\xFFCharacter by two-digit hex code

Control Verbs

(*COMMIT)No backtracking past this point

(*PRUNE)Directs the engine to “forget” any backtracking paths at this position

(*SKIP)Skips the current position and continues matching after the given point

(*FAIL)Forces an immediate match failure at this position

(*ACCEPT)Forcibly end the current match as successful right here

Control verbs (a.k.a. verb directives) are advanced PCRE features that alter backtracking flow.

POSIX Character Classes

[:upper:]Uppercase letters

[:lower:]Lowercase letters

[:alpha:]All letters

[:digit:]Digits

[:alnum:]Letters and digits

[:space:]Whitespace

[:punct:]Punctuation

[:graph:]Printable characters except spaces

[:print:]Printable characters including spaces

[:xdigit:]Hexadecimal digits

[:cntrl:]Control characters

These are character classes and need to be used inside square brackets [], ie [[:upper:]]

Groups and Ranges

(...)Capturing group

(?:...)Non-capturing group

(?<name>...)Named capturing group

[abc]Character set matching a, b, or c

[abcb]Duplicates in sets are ignored - same as [abc]

[^abc]Negated set matching everything except a, b, or c

[a-z]Range of lowercase letters a through z

[A-Z]Range of uppercase letters A through Z

[a-zA-Z]Range of all letters (both lowercase and uppercase)

[0-9]Range of digits 0 through 9

[a-zA-Z0-9]Range of all letters and digits (alphanumeric characters)

Named-group syntax varies: (?<name>...) works in PCRE2, JavaScript, Java, .NET, Ruby, and Rust; Python's re and Go use (?P<name>...); PCRE2 accepts both forms

Lookarounds and Assertions

(?=...)Positive lookahead

(?!...)Negative lookahead

(?<=...)Positive lookbehind

(?<!...)Negative lookbehind

(?>...)Atomic group (once-only subexpression)

(?#...)Inline comment ignored by engine

Unicode Support

\p{L}Any letter from any language

\p{M}Marks (accents, diacritics)

\p{N}Any numeric character

\p{Z}Separator characters (spaces, etc.)

\p{Han}Chinese characters (Mandarin/Cantonese)

\p{Devanagari}Hindi or Sanskrit characters

\p{Cyrillic}Cyrillic script (e.g., Russian)

\p{Arabic}Arabic script

\p{Tamil}Tamil script

\p{Greek}Greek script

\p{Hebrew}Hebrew script

\p{Thai}Thai script

\p{Emoji}Emoji characters

Character Classes

.Any character except newline. With 's', includes newlines

\wWord character (letters, digits, underscore)

\WNon-word character (inverse of \w)

\dDigit (0–9)

\DNon-digit (inverse of \d)

\sWhitespace (spaces, tabs, newlines, etc.)

\SNon-whitespace (inverse of \s)

Check Unicode modes for differences in \w, \d, etc. POSIX does not support these shorthands.

Quantifiers

*0 or more occurrences

+1 or more occurrences

?0 or 1 occurrences

{n}Exactly n occurrences

{n,}n or more occurrences

{n,m}Between n and m occurrences

?Makes quantifiers lazy (e.g., .+?, .*?)

Special Characters

\nNew line (LF)

\rCarriage return (CR)

\tTab character

\vVertical tab

\fForm feed

\aBell character

\eEscape character

\hHorizontal whitespace character

\HNon-horizontal whitespace character

\uFFFFUnicode character by 4-digit hex code

\x{FFFF}Unicode character by variable-length hex code

\xFFCharacter by two-digit hex code

POSIX Character Classes

[:upper:]Uppercase letters

[:lower:]Lowercase letters

[:alpha:]All letters

[:digit:]Digits

[:alnum:]Letters and digits

[:space:]Whitespace

[:punct:]Punctuation

[:graph:]Printable characters except spaces

[:print:]Printable characters including spaces

[:xdigit:]Hexadecimal digits

[:cntrl:]Control characters

These are character classes and need to be used inside square brackets [], ie [[:upper:]]

Lookarounds and Assertions

(?=...)Positive lookahead

(?!...)Negative lookahead

(?<=...)Positive lookbehind

(?<!...)Negative lookbehind

(?>...)Atomic group (once-only subexpression)

(?#...)Inline comment ignored by engine

Anchors and Boundaries

^Start of string, or start of line in multi-line mode

$End of string, or end of line in multi-line mode

\AStart of string (not affected by multi-line mode)

\zEnd of string (strict match)

\ZEnd of string, ignoring trailing newline

\GStart of match or end of previous match

\bWord boundary

\BNot a word boundary

\<Start of word (GNU/POSIX extension)

\>End of word (GNU/POSIX extension)

Flags and Modifiers

gGlobal / find-all. A JavaScript regex flag; other engines do find-all through an API call (e.g. Python findall/finditer), not a portable flag.

mMulti-line mode

iCase-insensitive matching

xIgnore whitespace (verbose mode)

sDot matches newline

uUnicode mode

XEnable additional syntax features (PCRE-specific)

UUngreedy matching (inverts greediness)

AAnchor match to the start of the string

JAllow duplicate group names

nDisable capturing groups

xxIgnore all whitespace and comments (PCRE extended)

Control Verbs

(*COMMIT)No backtracking past this point

(*PRUNE)Directs the engine to “forget” any backtracking paths at this position

(*SKIP)Skips the current position and continues matching after the given point

(*FAIL)Forces an immediate match failure at this position

(*ACCEPT)Forcibly end the current match as successful right here

Control verbs (a.k.a. verb directives) are advanced PCRE features that alter backtracking flow.

Groups and Ranges

(...)Capturing group

(?:...)Non-capturing group

(?<name>...)Named capturing group

[abc]Character set matching a, b, or c

[abcb]Duplicates in sets are ignored - same as [abc]

[^abc]Negated set matching everything except a, b, or c

[a-z]Range of lowercase letters a through z

[A-Z]Range of uppercase letters A through Z

[a-zA-Z]Range of all letters (both lowercase and uppercase)

[0-9]Range of digits 0 through 9

[a-zA-Z0-9]Range of all letters and digits (alphanumeric characters)

Named-group syntax varies: (?<name>...) works in PCRE2, JavaScript, Java, .NET, Ruby, and Rust; Python's re and Go use (?P<name>...); PCRE2 accepts both forms

Unicode Support

\p{L}Any letter from any language

\p{M}Marks (accents, diacritics)

\p{N}Any numeric character

\p{Z}Separator characters (spaces, etc.)

\p{Han}Chinese characters (Mandarin/Cantonese)

\p{Devanagari}Hindi or Sanskrit characters

\p{Cyrillic}Cyrillic script (e.g., Russian)

\p{Arabic}Arabic script

\p{Tamil}Tamil script

\p{Greek}Greek script

\p{Hebrew}Hebrew script

\p{Thai}Thai script

\p{Emoji}Emoji characters

Character Classes

.Any character except newline. With 's', includes newlines

\wWord character (letters, digits, underscore)

\WNon-word character (inverse of \w)

\dDigit (0–9)

\DNon-digit (inverse of \d)

\sWhitespace (spaces, tabs, newlines, etc.)

\SNon-whitespace (inverse of \s)

Check Unicode modes for differences in \w, \d, etc. POSIX does not support these shorthands.

Quantifiers

*0 or more occurrences

+1 or more occurrences

?0 or 1 occurrences

{n}Exactly n occurrences

{n,}n or more occurrences

{n,m}Between n and m occurrences

?Makes quantifiers lazy (e.g., .+?, .*?)

Special Characters

\nNew line (LF)

\rCarriage return (CR)

\tTab character

\vVertical tab

\fForm feed

\aBell character

\eEscape character

\hHorizontal whitespace character

\HNon-horizontal whitespace character

\uFFFFUnicode character by 4-digit hex code

\x{FFFF}Unicode character by variable-length hex code

\xFFCharacter by two-digit hex code

POSIX Character Classes

[:upper:]Uppercase letters

[:lower:]Lowercase letters

[:alpha:]All letters

[:digit:]Digits

[:alnum:]Letters and digits

[:space:]Whitespace

[:punct:]Punctuation

[:graph:]Printable characters except spaces

[:print:]Printable characters including spaces

[:xdigit:]Hexadecimal digits

[:cntrl:]Control characters

These are character classes and need to be used inside square brackets [], ie [[:upper:]]

Lookarounds and Assertions

(?=...)Positive lookahead

(?!...)Negative lookahead

(?<=...)Positive lookbehind

(?<!...)Negative lookbehind

(?>...)Atomic group (once-only subexpression)

(?#...)Inline comment ignored by engine

Anchors and Boundaries

^Start of string, or start of line in multi-line mode

$End of string, or end of line in multi-line mode

\AStart of string (not affected by multi-line mode)

\zEnd of string (strict match)

\ZEnd of string, ignoring trailing newline

\GStart of match or end of previous match

\bWord boundary

\BNot a word boundary

\<Start of word (GNU/POSIX extension)

\>End of word (GNU/POSIX extension)

Flags and Modifiers

gGlobal / find-all. A JavaScript regex flag; other engines do find-all through an API call (e.g. Python findall/finditer), not a portable flag.

mMulti-line mode

iCase-insensitive matching

xIgnore whitespace (verbose mode)

sDot matches newline

uUnicode mode

XEnable additional syntax features (PCRE-specific)

UUngreedy matching (inverts greediness)

AAnchor match to the start of the string

JAllow duplicate group names

nDisable capturing groups

xxIgnore all whitespace and comments (PCRE extended)

Control Verbs

(*COMMIT)No backtracking past this point

(*PRUNE)Directs the engine to “forget” any backtracking paths at this position

(*SKIP)Skips the current position and continues matching after the given point

(*FAIL)Forces an immediate match failure at this position

(*ACCEPT)Forcibly end the current match as successful right here

Control verbs (a.k.a. verb directives) are advanced PCRE features that alter backtracking flow.

Groups and Ranges

(...)Capturing group

(?:...)Non-capturing group

(?<name>...)Named capturing group

[abc]Character set matching a, b, or c

[abcb]Duplicates in sets are ignored - same as [abc]

[^abc]Negated set matching everything except a, b, or c

[a-z]Range of lowercase letters a through z

[A-Z]Range of uppercase letters A through Z

[a-zA-Z]Range of all letters (both lowercase and uppercase)

[0-9]Range of digits 0 through 9

[a-zA-Z0-9]Range of all letters and digits (alphanumeric characters)

Named-group syntax varies: (?<name>...) works in PCRE2, JavaScript, Java, .NET, Ruby, and Rust; Python's re and Go use (?P<name>...); PCRE2 accepts both forms

Unicode Support

\p{L}Any letter from any language

\p{M}Marks (accents, diacritics)

\p{N}Any numeric character

\p{Z}Separator characters (spaces, etc.)

\p{Han}Chinese characters (Mandarin/Cantonese)

\p{Devanagari}Hindi or Sanskrit characters

\p{Cyrillic}Cyrillic script (e.g., Russian)

\p{Arabic}Arabic script

\p{Tamil}Tamil script

\p{Greek}Greek script

\p{Hebrew}Hebrew script

\p{Thai}Thai script

\p{Emoji}Emoji characters

This regex cheat sheet is a single-page reference for every regular expression token, grouped by job: character classes, anchors, quantifiers, groups, lookarounds, flags, and Unicode properties. Each row is annotated for engine support across PCRE2 (PHP), JavaScript, Python, Go, Java, .NET, Ruby, Rust, and POSIX, so you can copy a pattern and know whether it runs in your runtime.

Every regex token I reach for, organized by job and annotated for engine support: PCRE2, JavaScript, Python, Go (RE2), Java, .NET, Ruby, and Rust. Anchors and boundaries, character classes, quantifiers, groups, lookarounds, flags, and the cross-engine differences that matter when you ship the same pattern across more than one runtime. The filter pills above the table narrow rows by engine. Task-based walkthroughs (validate an email, match a date, parse a URL) are linked below.

Regex cheat sheet with examples

Here's a quick regular expressions cheat sheet with examples to get started:

Basic Characters:
- .: Matches any character except newline. Example: a.c matches abc, adc.
- \w: Matches a word character (letters, digits, _). Example: \w+ matches hello123.
- \d: Matches any digit (0-9). Example: \d+ matches 123.
- \s: Matches whitespace (space, tab, newline). Example: \s+ matches spaces in hello world.
Anchors:
- ^: Matches the start of a string. Example: ^hello matches hello world.
- $: Matches the end of a string. Example: world$ matches hello world.
Quantifiers:
- *: Matches 0 or more occurrences. Example: a* matches aaa, a, or nothing.
- +: Matches 1 or more occurrences. Example: a+ matches aaa, a, but not empty.
- {n}: Matches exactly n occurrences. Example: a{3} matches aaa. Use {n,} for n or more, {n,m} for a range.
Groups:
- (abc): Captures abc as a group.
- (?:abc): Matches abc without capturing.
- (?<name>abc): Captures abc and names it name.

Regular expressions (regex) are powerful tools for text matching and manipulation. This regex cheat sheet is a quick start regex tutorial, helping you understand regex patterns, regex syntax, and some practical applications. Whether you need a Python regex, Java regex, or JavaScript regex, this guide is a definite beginner must. Use the flavor filter to check compatibility and help you save time before pulling your hair out wondering why \w does not work in your Bash Script or why \p{Devanagari} is not working in your JavaScript Regex. I have also provided compatibility in table form later on in the page.

Regex Features and Examples

Character Classes

code

"John.Doe@techearl.com, 123-456-7890, 2024-01-15"

Pattern	Description	Example	Match
`.`	Any character except newline.	`J.h`	`Joh` in "John.Doe"
`\w`	Word character (letters, digits, underscore)	`\w+`	`John`, `Doe`, `techearl`, `com`, `123`, `456`, `7890`, `2024`, `01`, `15`
`\d`	Digit (0-9)	`\d+`	`123`, `456`, `7890`, `2024`, `01`, `15`
`\s`	Whitespace (space, tab, newline)	`\s+`	Spaces after commas in "John.Doe@techearl.com, 123-456-7890, 2024-01-15"
`[abc]`	Matches a, b, or c.	`[abc]`	`c` in "techearl.com", `a` in "techearl.com"
`[^abc]`	Matches anything except a, b, c.	`[^abc]`	`J`, `o`, `h`, `n`, `.`, `D`, `e`, etc.
`[a-zA-Z]`	Matches any letter.	`[a-zA-Z]+`	`John`, `Doe`, `techearl`, `com`

Anchors and Boundaries

code

"SuperHero saves the day! Not so super villain"

Pattern	Description	Example Regex	Match
`^`	Start of string, or start of line in multi-line mode	`^Super`	Matches `Super` at start in "SuperHero saves the day!"
`$`	End of string, or end of line in multi-line mode	`villain$`	Matches `villain` at end in "Not so super villain"
`\A`	Start of string (not affected by multi-line mode)	`\ASuper`	Only matches `Super` at very start in "SuperHero saves the day!"
`\z`	End of string (strict match)	`villain\z`	Only matches `villain` at very end in "Not so super villain"
`\Z`	End of string, ignoring trailing newline	`villain\Z`	Matches `villain` in both "Not so super villain" and "Not so super villain\n"
`\G`	Start of match or end of previous match	`\G\w+\s*`	Matches words consecutively: `SuperHero`, `saves`, `the`, `day`
`\b`	Word boundary	`\bsuper\b`	Matches `super` in "Not so super villain" but not in "SuperHero"
`\b`	Word boundary (Unicode - requires unicode flag)	`\b\p{L}+\b`	Matches `привет`, `café`, `안녕` in "привет café 안녕" (with unicode flag enabled)
`\B`	Not a word boundary	`\BHero\B`	Matches `Hero` in "SuperHero" but not in "Hero saves"

Quantifiers

Pattern	Description	Example	Match
`*`	0 or more occurrences.	`ba*`	`b`, `ba`, `baa`
`+`	1 or more occurrences.	`ba+`	`ba`, `baa`
`?`	0 or 1 occurrence.	`ba?`	`b`, `ba`
`{n}`	Exactly n occurrences.	`a{3}`	`aaa`
`{n,}`	n or more occurrences.	`a{2,}`	`aa`, `aaa`
`{n,m}`	Between n and m occurrences.	`a{1,3}`	`a`, `aa`, `aaa`

Groups and Capturing

Pattern	Description	Example	Match
`(abc)`	Capturing group.	`(cat)`	Matches `cat`.
`(?:abc)`	Non-capturing group.	`(?:cat)`	Matches `cat` without capturing.
`(?<name>abc)`	Named capturing group. JavaScript, .NET, Java, Ruby, and Rust use this angle-bracket form; Python's `re` and Go use `(?P<name>...)`.	`(?<animal>cat)`	Captures `cat` as `animal`.

Lookaheads and Lookbehinds

Pattern	Description	Example Regex	Match
`(?=abc)`	Positive lookahead.	`\d(?= dollars)`	`5` in `5 dollars`.
`(?!abc)`	Negative lookahead.	`\d(?! dollars)`	`5` in `5 euros`.
`(?<=abc)`	Positive lookbehind.	`(?<=\$)\d+`	`10` in `$10`.
`(?<!abc)`	Negative lookbehind.	`(?<!\$)\d+`	`20` in `20 euros`.

Flags and Modifiers

Flag	Description	Example	Effect
`g`	Global match.	`/cat/g`	Finds all `cat` instances.
`i`	Case-insensitive match.	`/cat/i`	Matches `Cat`, `CAT`.
`m`	Multiline mode.	`/^cat/m`	Matches `cat` at line start.

PCRE Control Verbs

Control verbs are advanced features specific to PCRE (Perl-Compatible Regular Expressions). They allow you to manipulate the regex engine's backtracking and matching behavior directly, providing a level of control unavailable in most other regex engines. These verbs can optimize performance, enforce logic, or debug complex patterns by altering how the engine handles matches and failures.

Below is a brief introduction to each control verb and its function.

Control Verb	Description
`(*COMMIT)`	Prevents backtracking past this point. If a match fails after this point, the entire regex fails.
`(*PRUNE)`	Discards all backtracking paths at this position, effectively "pruning" them.
`(*SKIP)`	Skips the current match attempt and resumes matching from the next position.
`(*FAIL)`	Forces an immediate match failure at the current position.
`(*ACCEPT)`	Immediately ends the current match as successful, ignoring the remaining pattern.

These verbs provide a high degree of control over how the regex engine processes and evaluates patterns, making them powerful tools for optimizing regex performance and logic in PCRE.

Extended regex (ERE) cheat sheet: `grep -E`, `sed -E`, and `awk`

Most of this page documents PCRE / Perl-style regex, which is what JavaScript, Python, PHP, and the rest of the engines above use. The POSIX command-line tools are a different world, and this is the part that trips people up when they move a pattern from a language into the shell. POSIX defines two flavors, and the one you want for grep -E / egrep, sed -E (sed -r on GNU), and awk is Extended Regular Expressions (ERE). The other flavor, Basic Regular Expressions (BRE), is the default for plain grep, sed without -E, and expr.

The crux of the difference is what counts as a metacharacter. In ERE, + ? | ( ) { } are special on their own. In BRE, those same characters are literal until you escape them, so you have to write \+ \? \|  \{ \} to get the metacharacter meaning. That inversion is why the exact same pattern behaves differently between grep and grep -E.

ERE vs BRE: the same operators, escaped differently

code

"colour color colours"

Operator	ERE (grep -E, egrep, sed -E, awk)	BRE (grep, sed)	Meaning
1 or more	`colou+r`	`colou+r`	One or more `u`
0 or 1	`colou?r`	`colou?r`	Matches `color` and `colour`
Alternation	`grey\|gray`	`grey\|gray`	Either side of the pipe
Grouping	`(ab)+`	`(ab)+`	One or more of the group
Interval / repetition	`a{2,4}`	`a{2,4}`	Between 2 and 4 of `a`

A quick way to remember it: *, ., ^, $, and [...] mean the same thing in both flavors. It is only + ? | ( ) { } that flip, and ERE is the flavor where they work without the backslash.

ERE has no `\d`, `\w`, `\b`: use POSIX classes instead

The shorthand classes everyone reaches for, \d for a digit, \w for a word character, \b for a word boundary, are Perl/PCRE extensions, not part of POSIX ERE. GNU grep/sed quietly support \w and \b as GNU extensions, but awk and any portable script will not, and \d is not standard ERE at all. The portable answer is the POSIX bracket-expression classes, which must sit inside an outer [...], so a digit is [[:digit:]], not [:digit:] on its own.

Perl/PCRE	POSIX ERE equivalent	Matches
`\d`	`[[:digit:]]`	A digit 0-9
`\w`	`[[:alnum:]_]`	Letter, digit, or underscore
`[a-zA-Z]`	`[[:alpha:]]`	Any letter
`\s`	`[[:space:]]`	Whitespace
`[a-zA-Z0-9]`	`[[:alnum:]]`	Letter or digit

There is no POSIX class for a word boundary. GNU tools offer \b (and the GNU-specific \< / \> for the start and end of a word), but awk has none, so in portable scripts you anchor on [[:space:]] or the start/end of the line instead.

So the same "find a 3-digit number" pattern looks like this across the tools. Note [[:digit:]] carries the doubled brackets and the interval needs the unescaped braces that ERE gives you:

code

# ERE: grep -E, egrep, awk, sed -E
grep -E '[[:digit:]]{3}' file.txt

# BRE: plain grep needs the braces escaped
grep '[[:digit:]]\{3\}' file.txt

# awk uses ERE; print lines with a 3-digit run
awk '/[[:digit:]]{3}/' file.txt

The full POSIX class list ([:upper:], [:lower:], [:punct:], [:xdigit:], and the rest) is in the POSIX Character Classes block at the top of this cheat sheet, with engine support flagged per class.

Advanced Examples

Validate an Email Address

code

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Matches: user@example.com, hello.world@domain.org.

Match a Phone Number

code

^\+?\d{1,3}?[-.\s]?\(?\d{1,4}?\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9}$

Matches: +1-800-555-5555, (800) 555-5555.

Extract URLs

code

https?:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(\/\S*)?

Matches: https://example.com, http://domain.org/path.

Replace Multiple Spaces with One

code

\s{2,}

Replace with a single space to clean up text like "Hello world" into "Hello world".

Find Duplicate Words

code

\b(\w+)\s+\1\b

Matches: the the, hello hello.

Zero Punctuation and Regex in Action

Regex excels at handling edge cases, such as matching patterns without punctuation. For example, this regex ensures no punctuation in a string:

code

^[\w\s]+$

Matches: Hello World but not Hello, World!.

Regex Compatibility Tables

A quick compatibility reference for regular expressions across major engines including PCRE2 (PHP), ECMAScript 2024 (JavaScript), Python, Golang, Java, .NET, Rust, Ruby and POSIX, covering character classes, Unicode support and advanced features.

Regex Character Classes Table - Compatibility

Pattern	Description	PCRE2	ECMAScript 2024	Python >=3.9	Golang	Java 17	.NET 8.0	Rust	Ruby 3.0+	POSIX
.	Any character except newline. With 's', includes newlines.	✓	✓	✓	✓	✓	✓	✓	✓	✓
\w	Word character (letters, digits, underscore).	✓	✓	✓	✓	✓	✓	✓	✓	✗
\W	Non-word character (inverse of \w).	✓	✓	✓	✓	✓	✓	✓	✓	✗
\d	Digit (0–9).	✓	✓	✓	✓	✓	✓	✓	✓	✗
\D	Non-digit (inverse of \d).	✓	✓	✓	✓	✓	✓	✓	✓	✗
\s	Whitespace (spaces, tabs, newlines, etc.).	✓	✓	✓	✓	✓	✓	✓	✓	✗
\S	Non-whitespace (inverse of \s).	✓	✓	✓	✓	✓	✓	✓	✓	✗
Check Unicode modes for differences in \w, \d, etc. POSIX does not support these shorthands.

Regex Anchors and Boundaries Table - Compatibility

Pattern	Description	PCRE2	ECMAScript 2024	Python >=3.9	Golang	Java 17	.NET 8.0	Rust	Ruby 3.0+	POSIX
^	Start of string, or start of line in multi-line mode	✓	✓	✓	✓	✓	✓	✓	✓	✓
$	End of string, or end of line in multi-line mode	✓	✓	✓	✓	✓	✓	✓	✓	✓
\A	Start of string (not affected by multi-line mode)	✓	✗	✓	✗	✓	✓	✗	✓	✗
\z	End of string (strict match)	✓	✗	✓	✗	✗	✓	✗	✗	✗
\Z	End of string, ignoring trailing newline	✓	✗	✓	✗	✗	✓	✗	✓	✗
\b	Word boundary	✓	✓	✓	✓	✓	✓	✓	✓	✗
\B	Not a word boundary	✓	✓	✓	✓	✓	✓	✓	✓	✗
\<	Start of word (GNU/POSIX extension)	✗	✗	✗	✗	✗	✗	✗	✗	✓
\>	End of word (GNU/POSIX extension)	✗	✗	✗	✗	✗	✗	✗	✗	✓

Regex Quantifiers Table - Compatibility

Pattern	Description	PCRE2	ECMAScript 2024	Python >=3.9	Golang	Java 17	.NET 8.0	Rust	Ruby 3.0+	POSIX
*	0 or more occurrences.	✓	✓	✓	✓	✓	✓	✓	✓	✓
+	1 or more occurrences.	✓	✓	✓	✓	✓	✓	✓	✓	✓
?	0 or 1 occurrences.	✓	✓	✓	✓	✓	✓	✓	✓	✓
{n}	Exactly n occurrences.	✓	✓	✓	✓	✓	✓	✓	✓	✓
{n,}	n or more occurrences.	✓	✓	✓	✓	✓	✓	✓	✓	✓
{n,m}	Between n and m occurrences.	✓	✓	✓	✓	✓	✓	✓	✓	✓
?	Makes quantifiers lazy (e.g., .+?, .*?).	✓	✓	✓	✓	✓	✓	✓	✓	✗

Regex Flags and Modifiers Table - Compatibility

Pattern	Description	PCRE2	ECMAScript 2024	Python >=3.9	Golang	Java 17	.NET 8.0	Rust	Ruby 3.0+	POSIX
g	Global / find-all. A JavaScript regex flag; other engines do find-all through an API call (e.g. Python findall/finditer), not a portable flag.	✗	✓	✗	✗	✗	✗	✗	✗	✗
m	Multi-line mode	✓	✓	✓	✓	✓	✓	✓	✓	✓
i	Case-insensitive matching	✓	✓	✓	✓	✓	✓	✗	✓	✓
x	Ignore whitespace (verbose mode)	✓	✗	✓	✗	✗	✗	✗	✓	✗
s	Dot matches newline	✓	✓	✓	✗	✓	✓	✗	✓	✗
u	Unicode mode	✓	✓	✓	✗	✓	✓	✓	✓	✗
X	Enable additional syntax features (PCRE-specific)	✓	✗	✗	✗	✗	✗	✗	✗	✗
U	Ungreedy matching (inverts greediness)	✓	✗	✓	✗	✗	✗	✗	✓	✗
A	Anchor match to the start of the string	✓	✗	✗	✗	✗	✗	✗	✗	✓
J	Allow duplicate group names	✓	✗	✗	✗	✗	✗	✗	✗	✗
n	Disable capturing groups	✓	✗	✓	✗	✗	✗	✗	✗	✗
xx	Ignore all whitespace and comments (PCRE extended)	✓	✗	✗	✗	✗	✗	✗	✗	✗

Regex Special Characters Table - Compatibility

Pattern	Description	PCRE2	ECMAScript 2024	Python >=3.9	Golang	Java 17	.NET 8.0	Rust	Ruby 3.0+	POSIX
\n	New line (LF)	✓	✓	✓	✓	✓	✓	✓	✓	✓
\r	Carriage return (CR)	✓	✓	✓	✓	✓	✓	✓	✓	✓
\t	Tab character	✓	✓	✓	✓	✓	✓	✓	✓	✓
\v	Vertical tab	✓	✓	✓	✗	✓	✓	✗	✓	✓
\f	Form feed	✓	✓	✓	✗	✓	✓	✗	✓	✓
\a	Bell character	✓	✗	✓	✗	✓	✓	✗	✓	✗
\e	Escape character	✓	✗	✓	✗	✗	✗	✗	✓	✗
\h	Horizontal whitespace character	✓	✗	✓	✗	✗	✗	✗	✓	✗
\H	Non-horizontal whitespace character	✓	✗	✓	✗	✗	✗	✗	✓	✗
\uFFFF	Unicode character by 4-digit hex code	✓	✓	✗	✗	✓	✓	✗	✓	✗
\x{FFFF}	Unicode character by variable-length hex code	✓	✗	✓	✗	✗	✗	✗	✓	✗
\xFF	Character by two-digit hex code	✓	✓	✓	✗	✓	✓	✗	✓	✗

Regex Control Verbs Table - Compatibility

Pattern	Description	PCRE2	ECMAScript 2024	Python >=3.9	Golang	Java 17	.NET 8.0	Rust	Ruby 3.0+	POSIX
(*COMMIT)	No backtracking past this point.	✓	✗	✗	✗	✗	✗	✗	✗	✗
(*PRUNE)	Directs the engine to “forget” any backtracking paths at this position.	✓	✗	✗	✗	✗	✗	✗	✗	✗
(*SKIP)	Skips the current position and continues matching after the given point.	✓	✗	✗	✗	✗	✗	✗	✗	✗
(*FAIL)	Forces an immediate match failure at this position.	✓	✗	✗	✗	✗	✗	✗	✗	✗
(*ACCEPT)	Forcibly end the current match as successful right here.	✓	✗	✗	✗	✗	✗	✗	✗	✗
Control verbs (a.k.a. verb directives) are advanced PCRE features that alter backtracking flow.

Regex POSIX Character Classes Table - Compatibility

Pattern	Description	PCRE2	ECMAScript 2024	Python >=3.9	Golang	Java 17	.NET 8.0	Rust	Ruby 3.0+	POSIX
[:upper:]	Uppercase letters	✓	✗	✗	✓	✗	✗	✓	✓	✓
[:lower:]	Lowercase letters	✓	✗	✗	✓	✗	✗	✓	✓	✓
[:alpha:]	All letters	✓	✗	✗	✓	✗	✗	✓	✓	✓
[:digit:]	Digits	✓	✗	✗	✓	✗	✗	✓	✓	✓
[:alnum:]	Letters and digits	✓	✗	✗	✓	✗	✗	✓	✓	✓
[:space:]	Whitespace	✓	✗	✗	✓	✗	✗	✓	✓	✓
[:punct:]	Punctuation	✓	✗	✗	✓	✗	✗	✓	✓	✓
[:graph:]	Printable characters except spaces	✓	✗	✗	✓	✗	✗	✓	✓	✓
[:print:]	Printable characters including spaces	✓	✗	✗	✓	✗	✗	✓	✓	✓
[:xdigit:]	Hexadecimal digits	✓	✗	✗	✓	✗	✗	✓	✓	✓
[:cntrl:]	Control characters	✓	✗	✗	✓	✗	✗	✓	✓	✓
These are character classes and need to be used inside square brackets [], ie [[:upper:]]

Regex Groups and Ranges Table - Compatibility

Pattern	Description	PCRE2	ECMAScript 2024	Python >=3.9	Golang	Java 17	.NET 8.0	Rust	Ruby 3.0+	POSIX
(...)	Capturing group	✓	✓	✓	✓	✓	✓	✓	✓	✓
(?:...)	Non-capturing group	✓	✓	✓	✗	✓	✓	✗	✓	✗
(?<name>...)	Named capturing group	✓	✓	✗	✗	✓	✓	✓	✓	✗
[abc]	Character set matching a, b, or c	✓	✓	✓	✓	✓	✓	✓	✓	✓
[^abc]	Negated set matching everything except a, b, or c	✓	✓	✓	✓	✓	✓	✓	✓	✓
[a-q]	Range from a to q	✓	✓	✓	✓	✓	✓	✓	✓	✓
[A-Q]	Range from A to Q	✓	✓	✓	✓	✓	✓	✓	✓	✓
[0-7]	Range of digits 0 through 7	✓	✓	✓	✓	✓	✓	✓	✓	✓
Named-group syntax varies: (?<name>...) works in PCRE2, JavaScript, Java, .NET, Ruby, and Rust; Python's re and Go use (?P<name>...); PCRE2 accepts both forms

Regex Lookarounds and Assertions Table - Compatibility

Pattern	Description	PCRE2	ECMAScript 2024	Python >=3.9	Golang	Java 17	.NET 8.0	Rust	Ruby 3.0+	POSIX
(?=...)	Positive lookahead	✓	✓	✓	✗	✓	✓	✗	✓	✗
(?!...)	Negative lookahead	✓	✓	✓	✗	✓	✓	✗	✓	✗
(?<=...)	Positive lookbehind	✓	✓	✓	✗	✓	✓	✗	✗	✗
(?<!...)	Negative lookbehind	✓	✓	✓	✗	✓	✓	✗	✗	✗
(?>...)	Atomic group (once-only subexpression)	✓	✗	✓	✗	✓	✓	✗	✓	✗
(?#...)	Inline comment ignored by engine	✓	✗	✓	✗	✓	✓	✗	✓	✗

Regex Unicode Support Table - Compatibility

Pattern	Description	PCRE2	ECMAScript 2024	Python >=3.9	Golang	Java 17	.NET 8.0	Rust	Ruby 3.0+	POSIX
\p{L}	Any letter from any language	✓	✓	✗	✓	✓	✓	✓	✓	✗
\p{M}	Marks (accents, diacritics)	✓	✓	✗	✓	✓	✓	✓	✓	✗
\p{N}	Any numeric character	✓	✓	✗	✓	✓	✓	✓	✓	✗
\p{Z}	Separator characters (spaces, etc.)	✓	✓	✗	✓	✓	✓	✓	✓	✗
\p{Han}	Chinese characters (Mandarin/Cantonese)	✓	✓	✗	✓	✓	✓	✓	✓	✗
\p{Devanagari}	Hindi or Sanskrit characters	✓	✓	✗	✓	✓	✓	✓	✓	✗
\p{Cyrillic}	Cyrillic script (e.g., Russian)	✓	✓	✗	✓	✓	✓	✓	✓	✗
\p{Arabic}	Arabic script	✓	✓	✗	✓	✓	✓	✓	✓	✗
\p{Tamil}	Tamil script	✓	✓	✗	✓	✓	✓	✓	✓	✗
\p{Greek}	Greek script	✓	✓	✗	✓	✓	✓	✓	✓	✗
\p{Hebrew}	Hebrew script	✓	✓	✗	✓	✓	✓	✓	✓	✗
\p{Thai}	Thai script	✓	✓	✗	✓	✓	✓	✓	✓	✗
\p{Emoji}	Emoji characters	✓	✓	✗	✗	✗	✗	✓	✓	✗

Tools for Testing and Debugging Regex

Regex101: Interactive online regex tester.
RegExr: Explore and test regular expressions visually.
Regex Cheat Sheets: Downloadable PDFs for quick reference.

Step-by-step walkthroughs by task

The cheat sheet above is the syntax reference. For the patterns that come up over and over in real code, there is a focused walkthrough with multi-language examples (JavaScript, Python, PHP) for each.

Validation patterns

The character-class-and-anchors-heavy regexes you reach for in form validation.

How to Match an Email Address with Regex: The practical pattern, the strict RFC 5321 pattern, runnable JavaScript, Python, and PHP code, plus the test table of common edge cases.
How to Match a URL with Regex: Optional protocol, ports, query strings, fragments, and when to skip regex entirely and use the language's URL parser.
How to Validate a Credit Card Number with Regex: Brand-detection patterns (Visa, MasterCard, Amex, Discover) plus the Luhn checksum that regex alone cannot compute.
How to Validate a Strong Password with Regex: The stacked-lookahead pattern that enforces character-class requirements without caring about order.

Pattern matching for common data types

When you need to recognise a piece of structured text in the middle of unstructured input.

How to Match an IPv4 and IPv6 Address with Regex: Octet-bounded IPv4 (rejects 999.999.999.999), full IPv6 with compression, CIDR notation, and the parser fallback.
How to Match a Date with Regex (Multiple Formats): ISO 8601, US (MM/DD/YYYY), EU (DD/MM/YYYY), and what regex cannot do (leap years, month-length validation).
How to Match Numbers with Regex: Integer, decimal, signed, scientific notation, thousands-separated, currency, and percent forms.
How to Match a Hex Color Code with Regex: 3-digit and 6-digit forms, plus the 8-digit alpha-channel variant for modern CSS.
How to Match HTML Tags with Regex (and why you probably shouldn't): When regex on HTML is acceptable, when it isn't, and the parser alternative.
How to Match a Domain Name with Regex: RFC 1035 label rules, subdomain depth control, and the punycode form for internationalised domains. Pairs naturally with the DNS health check walkthrough for when you need to verify the domain actually resolves.

Features (zero-width assertions, groups)

The regex features that turn matching into context-aware extraction.

How to Use Regex Lookaheads and Lookbehinds: Zero-width assertions for context, plus the password-validation use case and engine compatibility notes.
How to Use Capturing Groups and Backreferences in Regex: Numbered groups, named groups, non-capturing groups, and how backreferences match the same text again.
Regex Anchors: The position-asserting tokens like ^ and $, plus the multiline-mode flag.
Regex Word Boundaries: Word-edge assertions including the Unicode-word edge cases that bite when matching non-ASCII text.
How to Use Regex in .htaccess: Applying these patterns in Apache mod_rewrite for HTTPS redirects, www normalization, clean URLs, and access blocking.
How to Use Regex in Nginx: The same patterns in Nginx location blocks and the rewrite directive, plus the location matching priority and why return beats rewrite.

Frequently asked questions

A regex cheat sheet is a single-page reference that lists every regular expression token grouped by job: character classes (\d, \w, \s), anchors (^, $, \b), quantifiers (*, +, ?), groups, lookarounds, flags, and Unicode properties.

This regex cheat sheet adds a per-engine compatibility column, so each token is annotated for PCRE2 (PHP), JavaScript, Python, Go, Java, .NET, Ruby, Rust, and POSIX. That way you can copy a pattern and know whether it actually runs in your language before you ship it.

The basics are identical: character classes, quantifiers, anchors, and groups all work the same way. Engine-specific differences show up at the edges.

JavaScript needs ES2018+ for lookbehind support. Python's named-group syntax accepts both (?P<name>) and the bare form. PHP's PCRE is one of the richest engines and supports nearly every feature. Go's regexp package uses RE2 which omits lookaround and backreferences for performance guarantees.

PCRE (Perl-Compatible Regular Expressions) is a specific regex engine implementation that follows Perl's syntax. It supports lookaround, named groups, recursive patterns, and atomic groups.

PHP, R, and many command-line tools (grep with -P, ripgrep with --pcre2) use PCRE. JavaScript and Python have their own engines with slightly different feature sets but most patterns are PCRE-compatible.

Use the i flag: /hello/i in JavaScript, re.compile(r"hello", re.IGNORECASE) in Python, '/hello/i' in PHP. The flag makes letter-character matches case-insensitive without changing the rest of the pattern.

For Unicode-aware case folding (so non-ASCII case rules apply correctly), Python and .NET have explicit Unicode case-folding flags; JavaScript handles this automatically with the u flag.

Catastrophic backtracking happens when a regex with nested quantifiers has many possible matching paths through the same input. On long inputs, the engine tries them all and grinds to a halt.

Avoid it by writing patterns that don't have ambiguous overlapping subpatterns. Use possessive quantifiers or atomic groups in engines that support them. In Go, the RE2 engine is immune by design because it omits backtracking entirely.

Regex is great for "is this string in the right shape" checks. For nested structure (HTML, JSON, programming languages), use a parser, because regex cannot match balanced parentheses or arbitrary nesting.

Common cases for the parser: HTML / XML (DOMParser, BeautifulSoup, lxml), JSON (the language's built-in JSON parser), source code (a real lexer). Common cases for regex: form validation, log scanning, search-and-replace in single-line text.

Prefix with a backslash. The regex metacharacters are: dot, caret, dollar, star, plus, question mark, parentheses, square brackets, curly braces, pipe, backslash, and slash.

Inside a character class, most metacharacters lose their meaning and do not need escaping. Only the close-bracket, backslash, caret, and hyphen need to be escaped inside a character class.

Greedy quantifiers match as much as possible then backtrack if needed. Lazy quantifiers (the same quantifier with a trailing ?) match as little as possible then expand if needed.

Use lazy quantifiers for "match up to the first occurrence" cases. For example, matching content between the first opening tag and the nearest closing tag rather than the furthest one.

Use (?<name>...) in modern engines: JavaScript (ES2018+), Python, PCRE, .NET, Java, Ruby, Rust. Python also accepts the legacy form (?P<name>...) for backward compatibility. Go's RE2 supports (?P<name>...) only.

Reference a named group later in the same pattern with \k<name> (or (?P=name) in Python's legacy syntax). In replacement strings, the syntax is engine-specific: $<name> in JavaScript and .NET, \g<name> in Python.

The flags every engine has: i for case-insensitive matching, m for multiline mode (so ^ and $ match at every line break), s (or singleline) for dotall mode so . matches newlines too, and g in JavaScript for global matching (other languages handle this through the function call rather than a flag).

JavaScript adds u for Unicode-aware matching and v (ES2024) for Unicode sets. Python uses re.IGNORECASE, re.MULTILINE, re.DOTALL as constants instead of inline flags, or the (?ims) syntax inside the pattern.

Use Unicode property classes: \p{L} for any letter (including non-ASCII), \p{N} for any digit, \p{Sc} for currency symbols, and many more. PCRE, .NET, Java, Ruby, and Rust support these directly. JavaScript needs the u flag to enable them.

Python's stdlib re does NOT support \p{...}. Use the third-party regex package, which is a drop-in replacement with full Unicode property support. Go's standard library has no \p{...} either; use explicit character ranges or the unicode package.

External regex references

For anything not on this page, the canonical online resources:

Regex101: interactive tester with token-by-token explanation, supports PCRE, Python, JavaScript, Go, and .NET flavors
MDN regex documentation: JavaScript regex reference
Python re module: Python's regex documentation
PCRE2 documentation: the C library PHP, R, and many tools use
RegEx category on TechEarl: all the regex articles on this site

Recommended books

This page is the quick reference. When you want the full mental model behind the syntax, these are the canonical regex books:

Mastering Regular Expressions (Jeffrey Friedl, 3rd edition). The definitive deep-dive on how regex engines actually work: backtracking, NFA versus DFA, and the optimisation that makes a pattern fast or catastrophic. Dense, and unmatched once you are past the basics.
Regular Expressions Cookbook (Jan Goyvaerts and Steven Levithan, 2nd edition). Problem-then-solution recipes across eight languages (JavaScript, Python, PHP, Java, .NET, Ruby, Perl, VB). The one to keep next to the keyboard.
Learning Regular Expressions (Ben Forta). The gentlest on-ramp: short, current, and example-driven. A good first book if you are still finding your feet.

Regular Expression Quick Reference Cheat Sheet

Character Classes

Anchors and Boundaries

Quantifiers

Flags and Modifiers

Special Characters

Control Verbs

POSIX Character Classes

Groups and Ranges

Lookarounds and Assertions

Unicode Support

Character Classes

Quantifiers

Special Characters

POSIX Character Classes

Lookarounds and Assertions

Anchors and Boundaries

Flags and Modifiers

Control Verbs

Groups and Ranges

Unicode Support

Character Classes

Quantifiers

Special Characters

POSIX Character Classes

Lookarounds and Assertions

Anchors and Boundaries

Flags and Modifiers

Control Verbs

Groups and Ranges

Unicode Support

Regex Compatibility Tables

What is a regex cheat sheet?

Are regular expressions the same in JavaScript, Python, and PHP?

What is the difference between regex and PCRE?

How do I make a regex case-insensitive?

What is catastrophic backtracking?

Should I use regex or a parser for complex inputs?

How do I escape special characters in regex?

What is the difference between greedy and lazy quantifiers?

How do I name a capture group in regex?

What are the most common regex flags?

How do I match Unicode characters or letters in regex?

Ishan Karunaratne