TechEarl

Regex Cheat Sheet

Regex Cheat Sheet including regex symbols, ranges, grouping, assertions, syntax tables, examples, matches, and compatibility tables. Definitive Regular Expressions Quick Reference!

Ishan KarunaratneIshan Karunaratne⏱️ 31 min readUpdated
Regex Cheat Sheet including regex symbols, ranges, grouping, assertions, syntax tables, examples, matches, and compatibility tables. Definitive Regular Expressions Quick Reference!

Every regex token I reach for, organized by job and annotated for engine support: PCRE2, JavaScript, Python, Go (RE2), Java, .NET, Ruby, and Rust. Anchors and boundaries, character classes, quantifiers, groups, lookarounds, flags, and the cross-engine differences that matter when you ship the same pattern across more than one runtime. Use the filter pills above the table to narrow rows by engine. Task-based walkthroughs (validate an email, match a date, parse a URL) are linked below the cheat sheet.

Regular Expression Quick Reference Cheat Sheet

A quick start regex cheat sheet reference guide for regular expressions, including regex syntax, symbols, ranges, grouping, assertions, Unicode handling, and some practical examples.

Character Classes

.Any character except newline. With 's', includes newlines
\wWord character (letters, digits, underscore)
\WNon-word character (inverse of \w)
\dDigit (0–9)
\DNon-digit (inverse of \d)
\sWhitespace (spaces, tabs, newlines, etc.)
\SNon-whitespace (inverse of \s)
Check Unicode modes for differences in \w, \d, etc. POSIX does not support these shorthands.

Anchors and Boundaries

^Start of string, or start of line in multi-line mode
$End of string, or end of line in multi-line mode
\AStart of string (not affected by multi-line mode)
\zEnd of string (strict match)
\ZEnd of string, ignoring trailing newline
\GStart of match or end of previous match
\bWord boundary
\BNot a word boundary
\<Start of word (GNU/POSIX extension)
\>End of word (GNU/POSIX extension)

Quantifiers

*0 or more occurrences
+1 or more occurrences
?0 or 1 occurrences
{n}Exactly n occurrences
{n,}n or more occurrences
{n,m}Between n and m occurrences
?Makes quantifiers lazy (e.g., .+?, .*?)

Flags and Modifiers

gGlobal / find-all. A JavaScript regex flag; other engines do find-all through an API call (e.g. Python findall/finditer), not a portable flag.
mMulti-line mode
iCase-insensitive matching
xIgnore whitespace (verbose mode)
sDot matches newline
uUnicode mode
XEnable additional syntax features (PCRE-specific)
UUngreedy matching (inverts greediness)
AAnchor match to the start of the string
JAllow duplicate group names
nDisable capturing groups
xxIgnore all whitespace and comments (PCRE extended)

Special Characters

\nNew line (LF)
\rCarriage return (CR)
\tTab character
\vVertical tab
\fForm feed
\aBell character
\eEscape character
\hHorizontal whitespace character
\HNon-horizontal whitespace character
\uFFFFUnicode character by 4-digit hex code
\x{FFFF}Unicode character by variable-length hex code
\xFFCharacter by two-digit hex code

Control Verbs

(*COMMIT)No backtracking past this point
(*PRUNE)Directs the engine to “forget” any backtracking paths at this position
(*SKIP)Skips the current position and continues matching after the given point
(*FAIL)Forces an immediate match failure at this position
(*ACCEPT)Forcibly end the current match as successful right here
Control verbs (a.k.a. verb directives) are advanced PCRE features that alter backtracking flow.

POSIX Character Classes

[:upper:]Uppercase letters
[:lower:]Lowercase letters
[:alpha:]All letters
[:digit:]Digits
[:alnum:]Letters and digits
[:space:]Whitespace
[:punct:]Punctuation
[:graph:]Printable characters except spaces
[:print:]Printable characters including spaces
[:xdigit:]Hexadecimal digits
[:cntrl:]Control characters
These are character classes and need to be used inside square brackets [], ie [[:upper:]]

Groups and Ranges

(...)Capturing group
(?:...)Non-capturing group
(?<name>...)Named capturing group
[abc]Character set matching a, b, or c
[abcb]Duplicates in sets are ignored - same as [abc]
[^abc]Negated set matching everything except a, b, or c
[a-z]Range of lowercase letters a through z
[A-Z]Range of uppercase letters A through Z
[a-zA-Z]Range of all letters (both lowercase and uppercase)
[0-9]Range of digits 0 through 9
[a-zA-Z0-9]Range of all letters and digits (alphanumeric characters)
Named-group syntax varies: (?<name>...) works in PCRE2, JavaScript, Java, .NET, Ruby, and Rust; Python's re and Go use (?P<name>...); PCRE2 accepts both forms

Lookarounds and Assertions

(?=...)Positive lookahead
(?!...)Negative lookahead
(?<=...)Positive lookbehind
(?<!...)Negative lookbehind
(?>...)Atomic group (once-only subexpression)
(?#...)Inline comment ignored by engine

Unicode Support

\p{L}Any letter from any language
\p{M}Marks (accents, diacritics)
\p{N}Any numeric character
\p{Z}Separator characters (spaces, etc.)
\p{Han}Chinese characters (Mandarin/Cantonese)
\p{Devanagari}Hindi or Sanskrit characters
\p{Cyrillic}Cyrillic script (e.g., Russian)
\p{Arabic}Arabic script
\p{Tamil}Tamil script
\p{Greek}Greek script
\p{Hebrew}Hebrew script
\p{Thai}Thai script
\p{Emoji}Emoji characters

Here's a quick regular expressions cheat sheet with examples to get started:

  1. Basic Characters:

    • .: Matches any character except newline. Example: a.c matches abc, adc.
    • \w: Matches a word character (letters, digits, _). Example: \w+ matches hello123.
    • \d: Matches any digit (0-9). Example: \d+ matches 123.
    • \s: Matches whitespace (space, tab, newline). Example: \s+ matches spaces in hello world.
  2. Anchors:

    • ^: Matches the start of a string. Example: ^hello matches hello world.
    • $: Matches the end of a string. Example: world$ matches hello world.
  3. Quantifiers:

    • *: Matches 0 or more occurrences. Example: a* matches aaa, a, or nothing.
    • +: Matches 1 or more occurrences. Example: a+ matches aaa, a, but not empty.
    • {n}: Matches exactly n occurrences. Example: a{3} matches aaa. Use {n,} for n or more, {n,m} for a range.
  4. Groups:

    • (abc): Captures abc as a group.
    • (?:abc): Matches abc without capturing.
    • (?<name>abc): Captures abc and names it name.

Regular expressions (regex) are powerful tools for text matching and manipulation. This regex cheat sheet is a quick start regex tutorial, helping you understand regex patterns, regex syntax, and some practical applications. Whether you need a Python regex, Java regex, or JavaScript regex, this guide is a definite beginner must. Use the flavor filter to check compatibility and help you save time before pulling your hair out wondering why \w does not work in your Bash Script or why \p{Devanagari} is not working in your JavaScript Regex. I have also provided compatibility in table form later on in the page.

Regex Features and Examples

Character Classes

code
"John.Doe@techearl.com, 123-456-7890, 2024-01-15"
PatternDescriptionExampleMatch
.Any character except newline.J.hJoh in "John.Doe"
\wWord character (letters, digits, underscore)\w+John, Doe, techearl, com, 123, 456, 7890, 2024, 01, 15
\dDigit (0-9)\d+123, 456, 7890, 2024, 01, 15
\sWhitespace (space, tab, newline)\s+Spaces after commas in "John.Doe@techearl.com, 123-456-7890, 2024-01-15"
[abc]Matches a, b, or c.[abc]c in "techearl.com", a in "techearl.com"
[^abc]Matches anything except a, b, c.[^abc]J, o, h, n, ., D, e, etc.
[a-zA-Z]Matches any letter.[a-zA-Z]+John, Doe, techearl, com

Anchors and Boundaries

code
"SuperHero saves the day! Not so super villain"
PatternDescriptionExample RegexMatch
^Start of string, or start of line in multi-line mode^SuperMatches Super at start in "SuperHero saves the day!"
$End of string, or end of line in multi-line modevillain$Matches villain at end in "Not so super villain"
\AStart of string (not affected by multi-line mode)\ASuperOnly matches Super at very start in "SuperHero saves the day!"
\zEnd of string (strict match)villain\zOnly matches villain at very end in "Not so super villain"
\ZEnd of string, ignoring trailing newlinevillain\ZMatches villain in both "Not so super villain" and "Not so super villain\n"
\GStart of match or end of previous match\G\w+\s*Matches words consecutively: SuperHero, saves, the, day
\bWord boundary\bsuper\bMatches super in "Not so super villain" but not in "SuperHero"
\bWord boundary (Unicode - requires unicode flag)\b\p{L}+\bMatches привет, café, 안녕 in "привет café 안녕" (with unicode flag enabled)
\BNot a word boundary\BHero\BMatches Hero in "SuperHero" but not in "Hero saves"

Quantifiers

PatternDescriptionExampleMatch
*0 or more occurrences.ba*b, ba, baa
+1 or more occurrences.ba+ba, baa
?0 or 1 occurrence.ba?b, ba
{n}Exactly n occurrences.a{3}aaa
{n,}n or more occurrences.a{2,}aa, aaa
{n,m}Between n and m occurrences.a{1,3}a, aa, aaa

Groups and Capturing

PatternDescriptionExampleMatch
(abc)Capturing group.(cat)Matches cat.
(?:abc)Non-capturing group.(?:cat)Matches cat without capturing.
(?<name>abc)Named capturing group. JavaScript, .NET, Java, Ruby, and Rust use this angle-bracket form; Python's re and Go use (?P<name>...).(?<animal>cat)Captures cat as animal.

Lookaheads and Lookbehinds

PatternDescriptionExample RegexMatch
(?=abc)Positive lookahead.\d(?= dollars)5 in 5 dollars.
(?!abc)Negative lookahead.\d(?! dollars)5 in 5 euros.
(?<=abc)Positive lookbehind.(?<=\$)\d+10 in $10.
(?<!abc)Negative lookbehind.(?<!\$)\d+20 in 20 euros.

Flags and Modifiers

FlagDescriptionExampleEffect
gGlobal match./cat/gFinds all cat instances.
iCase-insensitive match./cat/iMatches Cat, CAT.
mMultiline mode./^cat/mMatches cat at line start.

PCRE Control Verbs

Control verbs are advanced features specific to PCRE (Perl-Compatible Regular Expressions). They allow you to manipulate the regex engine's backtracking and matching behavior directly, providing a level of control unavailable in most other regex engines. These verbs can optimize performance, enforce logic, or debug complex patterns by altering how the engine handles matches and failures.

Below is a brief introduction to each control verb and its function.

Control VerbDescription
(*COMMIT)Prevents backtracking past this point. If a match fails after this point, the entire regex fails.
(*PRUNE)Discards all backtracking paths at this position, effectively "pruning" them.
(*SKIP)Skips the current match attempt and resumes matching from the next position.
(*FAIL)Forces an immediate match failure at the current position.
(*ACCEPT)Immediately ends the current match as successful, ignoring the remaining pattern.

These verbs provide a high degree of control over how the regex engine processes and evaluates patterns, making them powerful tools for optimizing regex performance and logic in PCRE.


Advanced Examples

Validate an Email Address

code
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Matches: user@example.com, hello.world@domain.org.


Match a Phone Number

code
^\+?\d{1,3}?[-.\s]?\(?\d{1,4}?\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9}$

Matches: +1-800-555-5555, (800) 555-5555.


Extract URLs

code
https?:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(\/\S*)?

Matches: https://example.com, http://domain.org/path.


Replace Multiple Spaces with One

code
\s{2,}

Replace with a single space to clean up text like "Hello world" into "Hello world".


Find Duplicate Words

code
\b(\w+)\s+\1\b

Matches: the the, hello hello.


Zero Punctuation and Regex in Action

Regex excels at handling edge cases, such as matching patterns without punctuation. For example, this regex ensures no punctuation in a string:

code
^[\w\s]+$

Matches: Hello World but not Hello, World!.


Regex Compatibility Tables

A quick compatibility reference for regular expressions across major engines including PCRE2 (PHP), ECMAScript 2024 (JavaScript), Python, Golang, Java, .NET, Rust, Ruby and POSIX, covering character classes, Unicode support and advanced features.

PatternDescriptionPCRE2ECMAScript 2024Python >=3.9GolangJava 17.NET 8.0RustRuby 3.0+POSIX
.Any character except newline. With 's', includes newlines.
\wWord character (letters, digits, underscore).
\WNon-word character (inverse of \w).
\dDigit (0–9).
\DNon-digit (inverse of \d).
\sWhitespace (spaces, tabs, newlines, etc.).
\SNon-whitespace (inverse of \s).
Check Unicode modes for differences in \w, \d, etc. POSIX does not support these shorthands.
PatternDescriptionPCRE2ECMAScript 2024Python >=3.9GolangJava 17.NET 8.0RustRuby 3.0+POSIX
^Start of string, or start of line in multi-line mode
$End of string, or end of line in multi-line mode
\AStart of string (not affected by multi-line mode)
\zEnd of string (strict match)
\ZEnd of string, ignoring trailing newline
\bWord boundary
\BNot a word boundary
\<Start of word (GNU/POSIX extension)
\>End of word (GNU/POSIX extension)
PatternDescriptionPCRE2ECMAScript 2024Python >=3.9GolangJava 17.NET 8.0RustRuby 3.0+POSIX
*0 or more occurrences.
+1 or more occurrences.
?0 or 1 occurrences.
{n}Exactly n occurrences.
{n,}n or more occurrences.
{n,m}Between n and m occurrences.
?Makes quantifiers lazy (e.g., .+?, .*?).
PatternDescriptionPCRE2ECMAScript 2024Python >=3.9GolangJava 17.NET 8.0RustRuby 3.0+POSIX
gGlobal / find-all. A JavaScript regex flag; other engines do find-all through an API call (e.g. Python findall/finditer), not a portable flag.
mMulti-line mode
iCase-insensitive matching
xIgnore whitespace (verbose mode)
sDot matches newline
uUnicode mode
XEnable additional syntax features (PCRE-specific)
UUngreedy matching (inverts greediness)
AAnchor match to the start of the string
JAllow duplicate group names
nDisable capturing groups
xxIgnore all whitespace and comments (PCRE extended)
PatternDescriptionPCRE2ECMAScript 2024Python >=3.9GolangJava 17.NET 8.0RustRuby 3.0+POSIX
\nNew line (LF)
\rCarriage return (CR)
\tTab character
\vVertical tab
\fForm feed
\aBell character
\eEscape character
\hHorizontal whitespace character
\HNon-horizontal whitespace character
\uFFFFUnicode character by 4-digit hex code
\x{FFFF}Unicode character by variable-length hex code
\xFFCharacter by two-digit hex code
PatternDescriptionPCRE2ECMAScript 2024Python >=3.9GolangJava 17.NET 8.0RustRuby 3.0+POSIX
(*COMMIT)No backtracking past this point.
(*PRUNE)Directs the engine to “forget” any backtracking paths at this position.
(*SKIP)Skips the current position and continues matching after the given point.
(*FAIL)Forces an immediate match failure at this position.
(*ACCEPT)Forcibly end the current match as successful right here.
Control verbs (a.k.a. verb directives) are advanced PCRE features that alter backtracking flow.
PatternDescriptionPCRE2ECMAScript 2024Python >=3.9GolangJava 17.NET 8.0RustRuby 3.0+POSIX
[:upper:]Uppercase letters
[:lower:]Lowercase letters
[:alpha:]All letters
[:digit:]Digits
[:alnum:]Letters and digits
[:space:]Whitespace
[:punct:]Punctuation
[:graph:]Printable characters except spaces
[:print:]Printable characters including spaces
[:xdigit:]Hexadecimal digits
[:cntrl:]Control characters
These are character classes and need to be used inside square brackets [], ie [[:upper:]]
PatternDescriptionPCRE2ECMAScript 2024Python >=3.9GolangJava 17.NET 8.0RustRuby 3.0+POSIX
(...)Capturing group
(?:...)Non-capturing group
(?<name>...)Named capturing group
[abc]Character set matching a, b, or c
[^abc]Negated set matching everything except a, b, or c
[a-q]Range from a to q
[A-Q]Range from A to Q
[0-7]Range of digits 0 through 7
Named-group syntax varies: (?<name>...) works in PCRE2, JavaScript, Java, .NET, Ruby, and Rust; Python's re and Go use (?P<name>...); PCRE2 accepts both forms
PatternDescriptionPCRE2ECMAScript 2024Python >=3.9GolangJava 17.NET 8.0RustRuby 3.0+POSIX
(?=...)Positive lookahead
(?!...)Negative lookahead
(?<=...)Positive lookbehind
(?<!...)Negative lookbehind
(?>...)Atomic group (once-only subexpression)
(?#...)Inline comment ignored by engine
PatternDescriptionPCRE2ECMAScript 2024Python >=3.9GolangJava 17.NET 8.0RustRuby 3.0+POSIX
\p{L}Any letter from any language
\p{M}Marks (accents, diacritics)
\p{N}Any numeric character
\p{Z}Separator characters (spaces, etc.)
\p{Han}Chinese characters (Mandarin/Cantonese)
\p{Devanagari}Hindi or Sanskrit characters
\p{Cyrillic}Cyrillic script (e.g., Russian)
\p{Arabic}Arabic script
\p{Tamil}Tamil script
\p{Greek}Greek script
\p{Hebrew}Hebrew script
\p{Thai}Thai script
\p{Emoji}Emoji characters

Tools for Testing and Debugging Regex

  • Regex101: Interactive online regex tester.
  • RegExr: Explore and test regular expressions visually.
  • Regex Cheat Sheets: Downloadable PDFs for quick reference.

Step-by-step walkthroughs by task

The cheat sheet above is the syntax reference. For the patterns that come up over and over in real code, there is a focused walkthrough with multi-language examples (JavaScript, Python, PHP) for each.

Validation patterns

The character-class-and-anchors-heavy regexes you reach for in form validation.

Pattern matching for common data types

When you need to recognise a piece of structured text in the middle of unstructured input.

Features (zero-width assertions, groups)

The regex features that turn matching into context-aware extraction.

Frequently asked questions

External regex references

For anything not on this page, the canonical online resources:


TagsRegular ExpressionsRegexRegex SyntaxRegExpCheat SheetProgrammingDevelopmentPCREPCRE2JavaScriptPythonGolangJava.NETC#RustRubyPOSIX
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

Open vintage hardcover reference manual on a dark slate desk, dense columned print on warm cream pages lit by a single warm amber side lamp

MySQL Cheat Sheet

MySQL cheat sheet covering CLI commands, database and table operations, joins, indexes, backups, user management, and transactions, with version notes for 5.7, 8.0, and 8.4.

Elasticsearch 9.x cheat sheet: index and document operations, Query DSL, aggregations, vector / kNN search, ESQL, cluster management, and common mistakes.

Elasticsearch Cheat Sheet

Practitioner reference for Elasticsearch 9.x: index and document operations, Query DSL, aggregations, vector / kNN search, ESQL, cluster management, version compatibility notes, and the gotchas that bite first-time operators.

Regex anchors explained for production use: how ^, $, \A, \Z, \z, and \G assert positions without matching characters, with examples, multiline-mode gotchas, and language support across JavaScript, Python, Ruby, PCRE, .NET, Go, Java, Rust, and POSIX.

Regex Anchors

Regex anchors are unique tokens that assert positions within a string without matching characters. Discover their role in pattern matching across languages.