Regex Capturing Groups and Backreferences: Numbered, Named, Non-Capturing (2026)

Parentheses in regex do two things at once. They group a sub-pattern so quantifiers apply to the whole group, and they capture the matched substring so you can pull it out after the match. (\d+) matches one-or-more digits AND lets you grab those digits via match[1] (JavaScript), m.group(1) (Python), or $1 in a replacement string. The variants from there: named groups (?<name>...), non-capturing (?:...), and backreferences \1 or \k<name> that match the SAME text the group captured. Below I walk all four with the most common practical use cases (duplicate-word detection, swapping fields with a replacement, structured parsing), engine notes per language, and the bugs I've shipped.

The reason this feature is everywhere in real regex code: most useful patterns aren't just matching, they're extracting. You match a URL to extract the domain. You match a log line to pull the timestamp. You match a date to capture the month. Capturing groups are how.

Quick reference

Syntax	Purpose
`(pattern)`	Capturing group, numbered left-to-right starting at 1
`(?:pattern)`	Non-capturing group (grouping only)
`(?<name>pattern)`	Named capturing group
`\1`, `\2`	Backreference to a numbered group
`\k<name>`	Backreference to a named group
`$1`, `$2` (replacement)	Insert the captured text into a replacement string

Basic capturing groups

code

(\d{4})-(\d{2})-(\d{2})

Three groups: year, month, day. After a match against 2025-10-29, the groups contain:

Group 0 (the full match): 2025-10-29
Group 1: 2025
Group 2: 10
Group 3: 29

Groups are numbered left-to-right by their opening parenthesis, starting at 1.

Named groups

For complex patterns with many groups, names are easier to read than numbers:

code

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

In code, access via match.groups.year (JavaScript), m.group("year") (Python), or $matches['year'] (PHP). The numbered access (match[1], etc.) still works in parallel.

Different engines use different syntax for named groups:

Engine	Named group syntax	Named backreference
JavaScript	`(?<name>...)`	`\k<name>`
Python	`(?P<name>...)` or `(?<name>...)` (3.12+)	`(?P=name)` or `\k<name>`
PHP (PCRE)	`(?<name>...)` or `(?P<name>...)`	`\k<name>` or `(?P=name)`
Ruby	`(?<name>...)`	`\k<name>`
.NET	`(?<name>...)`	`\k<name>`

For cross-engine portability, prefer (?<name>...). It's the most widely accepted form.

Non-capturing groups

If you want grouping (for | alternation or to apply a quantifier) without capturing, prefix with ?::

code

(?:https?|ftp):\/\/

The group is needed because of the alternation, but you don't care about the captured value separately. Without ?:, this would use up group 1 for https / http / ftp and shift all your subsequent groups by one.

Use non-capturing groups whenever you don't actually need the captured value. It makes intent clearer and avoids polluting the numbered groups.

Backreferences: match the same text again

A backreference matches the same text that an earlier capturing group matched. Syntax: \1, \2, etc. for numbered groups; \k<name> for named groups.

Duplicate-word detection:

code

\b(\w+)\s+\1\b

This matches the the in a sentence. The (\w+) captures a word, \s+\1 requires whitespace then the SAME word. Useful for editorial scanning.

Same-tag HTML matching:

code

<(h[1-6])([^>]*)>(.*?)<\/\1>

The \1 ensures the closing tag is the same heading level as the opening tag. Without it, <h1>...</h3> would match (which is invalid HTML).

Backreferences in replacement strings

The same backreferences work in replacement strings for find-and-replace operations:

code

Find: (\w+) (\w+)
Replace: $2 $1

This swaps two whitespace-separated words. Against Hello World, it produces World Hello. Some engines use \1 and \2 in replacements (PHP, Python's re.sub), others use $1 and $2 (JavaScript). Named-group replacements use $<name> or \g<name> depending on engine.

Practical use cases

Find duplicate consecutive words:

code

\b(\w+)\s+\1\b

Swap "Lastname, Firstname" to "Firstname Lastname":

code

Find:    (\w+),\s+(\w+)
Replace: $2 $1

Parse "name=value" pairs:

code

(\w+)=("[^"]*"|'[^']*'|[^\s]+)

The two groups give you the key and the value (with quotes still attached if present).

Wrap HTML hex colour values in <code> tags:

code

Find:    #([0-9A-Fa-f]{3}|[0-9A-Fa-f]{6})\b
Replace: <code>#$1</code>

Convert dates from MM/DD/YYYY to YYYY-MM-DD:

code

Find:    (\d{2})\/(\d{2})\/(\d{4})
Replace: $3-$1-$2

Examples in JavaScript, Python, and PHP

JavaScript:

javascript

const text = "Born 1985-10-29, registered 2010-04-15";

// Numbered groups
const isoDate = /(\d{4})-(\d{2})-(\d{2})/g;
let m;
while ((m = isoDate.exec(text)) !== null) {
  console.log(`year=${m[1]}, month=${m[2]}, day=${m[3]}`);
}

// Named groups
const named = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/g;
for (const match of text.matchAll(named)) {
  console.log(`year=${match.groups.year}, day=${match.groups.day}`);
}

// Swap two words
"Hello World".replace(/(\w+) (\w+)/, "$2 $1");  // 'World Hello'

Python:

python

import re

text = "Born 1985-10-29, registered 2010-04-15"

# Numbered groups
for m in re.finditer(r"(\d{4})-(\d{2})-(\d{2})", text):
    print(f"year={m.group(1)}, month={m.group(2)}, day={m.group(3)}")

# Named groups
for m in re.finditer(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", text):
    print(f"year={m.group('year')}, day={m.group('day')}")

# Swap two words
re.sub(r"(\w+) (\w+)", r"\2 \1", "Hello World")  # 'World Hello'

PHP:

php

$text = "Born 1985-10-29, registered 2010-04-15";

// Numbered groups
preg_match_all('/(\d{4})-(\d{2})-(\d{2})/', $text, $matches);
// $matches[1] = ['1985', '2010'], $matches[2] = ['10', '04'], $matches[3] = ['29', '15']

// Named groups
preg_match_all('/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/', $text, $matches);
// $matches['year'] = ['1985', '2010']

// Swap two words
preg_replace('/(\w+) (\w+)/', '$2 $1', "Hello World");  // 'World Hello'

Engine compatibility

Numbered groups work everywhere. Named groups and backreferences have meaningful per-engine quirks.

Engine	Numbered groups	Named groups	Backreferences	Replacement syntax
JavaScript	Works	`(?<name>...)` (ES2018+)	`\1` in pattern, `$1` in replacement	`$1`, `$<name>`
Python (`re`)	Works	`(?P<name>...)` and `(?<name>...)` (3.12+)	`\1` in pattern, `\1` in replacement	`\1`, `\g<name>`
Python (`regex` pkg)	Works	All forms	Works	All forms
PHP (PCRE)	Works	`(?<name>...)` or `(?P<name>...)`	`\1` in pattern, `$1` or `\1` in replacement	`$1`, `${name}`
Java	Works	`(?<name>...)`	`\1` in pattern, `$1` in replacement	`$1`, `${name}`
.NET	Works	`(?<name>...)`	`\1` in pattern, `$1` in replacement	`$1`, `${name}`
Go (RE2)	Works	`(?P<name>...)` only	Not supported	`$1`, `${name}`
Rust (`regex` crate)	Works	`(?P<name>...)` only	Not supported	`$1`, `${name}`
Ruby	Works	`(?<name>...)`	`\1` in pattern, `\1` in replacement	`\1`, `\k<name>`
POSIX BRE (`sed`, `grep`)	`$...$` only	Not supported	`\1` in pattern, `\1` in replacement	`\1`
POSIX ERE (`grep -E`, `awk`)	`(...)`	Not supported	Varies (BSD vs GNU)	`\1`

The most important caveat: Go and Rust do not support backreferences in the pattern. So \b(\w+)\s+\1\b does not work in those engines. The workaround is to find all words with a non-backreference pattern, then check adjacent pairs in code.

Common mistakes

The bugs I see most often.

Forgetting that adding a new group renumbers everything to its right. If your replacement was $3-$1-$2 and you add a new capturing group earlier in the pattern, $3 now points at something else. Either use non-capturing groups (?:...) for everything you don't need to extract, or switch to named groups so additions don't reorder anything.

Backslash escape confusion in the replacement string. PHP and Python use \1 in replacements; JavaScript uses $1. If you copy a Python re.sub call to JavaScript and forget to convert, the replacement becomes the literal string \1. Match the syntax to the engine.

Using a backreference where none is captured. (?:foo)\1 does not work because (?:...) is non-capturing, so \1 has no group to refer to. Use a capturing group (foo)\1 if you need the backreference.

Greedy capture across a delimiter. (.*),(.*) against a,b,c captures group 1 as a,b and group 2 as c because .* is greedy. Use ([^,]*),(.*) to capture up to the first comma instead.

Trying to use backreferences in lookbehinds in fixed-width engines. Java, Python's stdlib re, and .NET (until recently) require lookbehinds to be fixed-width. A pattern like (?<=(\w+)) may compile but the backreference width depends on the input. Use the regex package in Python for variable-width lookbehinds, or restructure.

Capturing inside a quantifier and expecting to get all matches. (\w+)+ only captures the LAST iteration's text into group 1. To get all matches, run findall / matchAll / preg_match_all over the unrolled pattern, or split on a delimiter first.

Test cases

Pattern	Input	Groups
`(\d{4})-(\d{2})-(\d{2})`	`2025-10-29`	`2025`, `10`, `29`
`(?<y>\d{4})`	`2025`	`groups.y = 2025`
`(\w+),\s*(\w+)`	`Smith, Alice`	`Smith`, `Alice`
`\b(\w+)\s+\1\b`	`the the cat`	Matches `the the`, group 1 is `the`
`(?:https?\|ftp):\/\/`	`https://example.com`	No groups (non-capturing)
`(.)(.)(.)\3\2\1`	`abccba`	`a`, `b`, `c` (palindrome of length 6)

FAQ

(...) is a capturing group: it groups the sub-pattern AND captures the matched text into a numbered slot you can access later. (?:...) is a non-capturing group: it groups but does not capture.

Use (?:...) when you need grouping (for alternation or quantifier scope) but don't care about the captured value. It makes intent clearer and avoids shifting your numbered groups.

Syntax: (?<name>...). Access via match.groups.name (JavaScript), m.group("name") (Python), or $matches['name'] (PHP).

Python also accepts the older (?P<name>...) syntax. For cross-engine portability prefer the bare (?<name>...) form, which works in modern Python, JavaScript, PHP, Ruby, and .NET.

A backreference matches the same text that an earlier capturing group matched. \1 matches the same text as group 1, \2 for group 2, and \k<name> for named groups.

Example: \b(\w+)\s+\1\b matches a duplicated word like the the because \1 requires the second word to be identical to the first capture.

JavaScript and PHP use $1, $2, etc. in the replacement string. Python uses \1, \2 (or \g<name> for named).

Example to swap two words in JavaScript: "Hello World".replace(/(\w+) (\w+)/, "$2 $1") produces "World Hello".

Every set of parentheses is a capturing group by default and gets a number. When you add a new group in the middle of a pattern, every group to its right shifts by one. What was $3 becomes $4, breaking your replacement.

Fix: use non-capturing groups (?:...) wherever you only need grouping, or switch to named groups so additions don't reorder anything.

Because Go's regexp (RE2) and Rust's regex crate guarantee linear-time matching, and backreferences can make matching exponentially slow. They are excluded by design.

Workaround: capture all matches with a non-backreference pattern, then compare adjacent values in code. For \b(\w+)\s+\1\b, that means finding all words and checking consecutive pairs.

Yes, but the group only captures the branch that actually matched. (foo)|(bar) against foo captures group 1 as foo and group 2 as undefined/None. Against bar, it's the opposite.

If you don't care which branch matched and just want "either", wrap with a single capturing group: (foo|bar). Group 1 then holds whichever matched.

Recommended books

Groups and backreferences are the gateway from matching into real text processing. For the full treatment:

Regular Expressions Cookbook (Jan Goyvaerts and Steven Levithan, 2nd edition). Problem-then-solution recipes across eight languages (JavaScript, Python, PHP, Java, .NET, Ruby, Perl, VB). The one to keep next to the keyboard.
Mastering Regular Expressions (Jeffrey Friedl, 3rd edition). The definitive deep-dive on how regex engines actually work: backtracking, NFA versus DFA, and the optimisation that makes a pattern fast or catastrophic. Dense, and unmatched once you are past the basics.
Learning Regular Expressions (Ben Forta). The gentlest on-ramp: short, current, and example-driven. A good first book if you are still finding your feet.

How to Use Capturing Groups and Backreferences in Regex

Quick reference

Basic capturing groups

Named groups

Non-capturing groups

Backreferences: match the same text again

Backreferences in replacement strings

Practical use cases

Examples in JavaScript, Python, and PHP

Engine compatibility

Common mistakes

Test cases

FAQ

See also

Recommended books

Ishan Karunaratne

Related posts

How to List Users and Groups on Linux

How to Count Rows in an ACF Repeater Field

How to Match an IPv4 and IPv6 Address with Regex

What is the difference between (...) and (?:...) in regex?

How do I use named groups in regex?

What is a backreference in regex?

How do I reference a captured group in a replacement string?

Why does my regex with many parentheses behave unexpectedly?

Why don't backreferences work in Go or Rust?

Can I use capturing groups across regex alternation?

Ishan Karunaratne