TechEarl

How to Use Capturing Groups and Backreferences in Regex

Capturing groups, named groups, non-capturing groups, and backreferences in regex. JavaScript / Python / PHP examples, engine notes, common mistakes, and the duplicate-word and swap-fields use cases.

Ishan KarunaratneIshan Karunaratne⏱️ 10 min readUpdated
Regex capturing groups, named groups, non-capturing groups, and backreferences in JavaScript, Python, and PHP, with the duplicate-word and field-swap use cases.

Parentheses in regex do two things at once. They group a sub-pattern so quantifiers apply to the whole group, and they capture the matched substring so you can pull it out after the match. (\d+) matches one-or-more digits AND lets you grab those digits via match[1] (JavaScript), m.group(1) (Python), or $1 in a replacement string. The variants from there: named groups (?<name>...), non-capturing (?:...), and backreferences \1 or \k<name> that match the SAME text the group captured. Below I walk all four with the most common practical use cases (duplicate-word detection, swapping fields with a replacement, structured parsing), engine notes per language, and the bugs I've shipped.

The reason this feature is everywhere in real regex code: most useful patterns aren't just matching, they're extracting. You match a URL to extract the domain. You match a log line to pull the timestamp. You match a date to capture the month. Capturing groups are how.

Quick reference

SyntaxPurpose
(pattern)Capturing group, numbered left-to-right starting at 1
(?:pattern)Non-capturing group (grouping only)
(?<name>pattern)Named capturing group
\1, \2Backreference to a numbered group
\k<name>Backreference to a named group
$1, $2 (replacement)Insert the captured text into a replacement string

Basic capturing groups

code
(\d{4})-(\d{2})-(\d{2})

Three groups: year, month, day. After a match against 2025-10-29, the groups contain:

  • Group 0 (the full match): 2025-10-29
  • Group 1: 2025
  • Group 2: 10
  • Group 3: 29

Groups are numbered left-to-right by their opening parenthesis, starting at 1.

Named groups

For complex patterns with many groups, names are easier to read than numbers:

code
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

In code, access via match.groups.year (JavaScript), m.group("year") (Python), or $matches['year'] (PHP). The numbered access (match[1], etc.) still works in parallel.

Different engines use different syntax for named groups:

EngineNamed group syntaxNamed backreference
JavaScript(?<name>...)\k<name>
Python(?P<name>...) or (?<name>...) (3.12+)(?P=name) or \k<name>
PHP (PCRE)(?<name>...) or (?P<name>...)\k<name> or (?P=name)
Ruby(?<name>...)\k<name>
.NET(?<name>...)\k<name>

For cross-engine portability, prefer (?<name>...). It's the most widely accepted form.

Non-capturing groups

If you want grouping (for | alternation or to apply a quantifier) without capturing, prefix with ?::

code
(?:https?|ftp):\/\/

The group is needed because of the alternation, but you don't care about the captured value separately. Without ?:, this would use up group 1 for https / http / ftp and shift all your subsequent groups by one.

Use non-capturing groups whenever you don't actually need the captured value. It makes intent clearer and avoids polluting the numbered groups.

Backreferences: match the same text again

A backreference matches the same text that an earlier capturing group matched. Syntax: \1, \2, etc. for numbered groups; \k<name> for named groups.

Duplicate-word detection:

code
\b(\w+)\s+\1\b

This matches the the in a sentence. The (\w+) captures a word, \s+\1 requires whitespace then the SAME word. Useful for editorial scanning.

Same-tag HTML matching:

code
<(h[1-6])([^>]*)>(.*?)<\/\1>

The \1 ensures the closing tag is the same heading level as the opening tag. Without it, <h1>...</h3> would match (which is invalid HTML).

Backreferences in replacement strings

The same backreferences work in replacement strings for find-and-replace operations:

code
Find: (\w+) (\w+)
Replace: $2 $1

This swaps two whitespace-separated words. Against Hello World, it produces World Hello. Some engines use \1 and \2 in replacements (PHP, Python's re.sub), others use $1 and $2 (JavaScript). Named-group replacements use $<name> or \g<name> depending on engine.

Practical use cases

Find duplicate consecutive words:

code
\b(\w+)\s+\1\b

Swap "Lastname, Firstname" to "Firstname Lastname":

code
Find:    (\w+),\s+(\w+)
Replace: $2 $1

Parse "name=value" pairs:

code
(\w+)=("[^"]*"|'[^']*'|[^\s]+)

The two groups give you the key and the value (with quotes still attached if present).

Wrap HTML hex colour values in <code> tags:

code
Find:    #([0-9A-Fa-f]{3}|[0-9A-Fa-f]{6})\b
Replace: <code>#$1</code>

Convert dates from MM/DD/YYYY to YYYY-MM-DD:

code
Find:    (\d{2})\/(\d{2})\/(\d{4})
Replace: $3-$1-$2

Examples in JavaScript, Python, and PHP

JavaScript:

javascript
const text = "Born 1985-10-29, registered 2010-04-15";

// Numbered groups
const isoDate = /(\d{4})-(\d{2})-(\d{2})/g;
let m;
while ((m = isoDate.exec(text)) !== null) {
  console.log(`year=${m[1]}, month=${m[2]}, day=${m[3]}`);
}

// Named groups
const named = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/g;
for (const match of text.matchAll(named)) {
  console.log(`year=${match.groups.year}, day=${match.groups.day}`);
}

// Swap two words
"Hello World".replace(/(\w+) (\w+)/, "$2 $1");  // 'World Hello'

Python:

python
import re

text = "Born 1985-10-29, registered 2010-04-15"

# Numbered groups
for m in re.finditer(r"(\d{4})-(\d{2})-(\d{2})", text):
    print(f"year={m.group(1)}, month={m.group(2)}, day={m.group(3)}")

# Named groups
for m in re.finditer(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", text):
    print(f"year={m.group('year')}, day={m.group('day')}")

# Swap two words
re.sub(r"(\w+) (\w+)", r"\2 \1", "Hello World")  # 'World Hello'

PHP:

php
$text = "Born 1985-10-29, registered 2010-04-15";

// Numbered groups
preg_match_all('/(\d{4})-(\d{2})-(\d{2})/', $text, $matches);
// $matches[1] = ['1985', '2010'], $matches[2] = ['10', '04'], $matches[3] = ['29', '15']

// Named groups
preg_match_all('/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/', $text, $matches);
// $matches['year'] = ['1985', '2010']

// Swap two words
preg_replace('/(\w+) (\w+)/', '$2 $1', "Hello World");  // 'World Hello'

Engine compatibility

Numbered groups work everywhere. Named groups and backreferences have meaningful per-engine quirks.

EngineNumbered groupsNamed groupsBackreferencesReplacement syntax
JavaScriptWorks(?<name>...) (ES2018+)\1 in pattern, $1 in replacement$1, $<name>
Python (re)Works(?P<name>...) and (?<name>...) (3.12+)\1 in pattern, \1 in replacement\1, \g<name>
Python (regex pkg)WorksAll formsWorksAll forms
PHP (PCRE)Works(?<name>...) or (?P<name>...)\1 in pattern, $1 or \1 in replacement$1, ${name}
JavaWorks(?<name>...)\1 in pattern, $1 in replacement$1, ${name}
.NETWorks(?<name>...)\1 in pattern, $1 in replacement$1, ${name}
Go (RE2)Works(?P<name>...) onlyNot supported$1, ${name}
Rust (regex crate)Works(?P<name>...) onlyNot supported$1, ${name}
RubyWorks(?<name>...)\1 in pattern, \1 in replacement\1, \k<name>
POSIX BRE (sed, grep)\(...\) onlyNot supported\1 in pattern, \1 in replacement\1
POSIX ERE (grep -E, awk)(...)Not supportedVaries (BSD vs GNU)\1

The most important caveat: Go and Rust do not support backreferences in the pattern. So \b(\w+)\s+\1\b does not work in those engines. The workaround is to find all words with a non-backreference pattern, then check adjacent pairs in code.

Common mistakes

The bugs I see most often.

Forgetting that adding a new group renumbers everything to its right. If your replacement was $3-$1-$2 and you add a new capturing group earlier in the pattern, $3 now points at something else. Either use non-capturing groups (?:...) for everything you don't need to extract, or switch to named groups so additions don't reorder anything.

Backslash escape confusion in the replacement string. PHP and Python use \1 in replacements; JavaScript uses $1. If you copy a Python re.sub call to JavaScript and forget to convert, the replacement becomes the literal string \1. Match the syntax to the engine.

Using a backreference where none is captured. (?:foo)\1 does not work because (?:...) is non-capturing, so \1 has no group to refer to. Use a capturing group (foo)\1 if you need the backreference.

Greedy capture across a delimiter. (.*),(.*) against a,b,c captures group 1 as a,b and group 2 as c because .* is greedy. Use ([^,]*),(.*) to capture up to the first comma instead.

Trying to use backreferences in lookbehinds in fixed-width engines. Java, Python's stdlib re, and .NET (until recently) require lookbehinds to be fixed-width. A pattern like (?<=(\w+)) may compile but the backreference width depends on the input. Use the regex package in Python for variable-width lookbehinds, or restructure.

Capturing inside a quantifier and expecting to get all matches. (\w+)+ only captures the LAST iteration's text into group 1. To get all matches, run findall / matchAll / preg_match_all over the unrolled pattern, or split on a delimiter first.

Test cases

PatternInputGroups
(\d{4})-(\d{2})-(\d{2})2025-10-292025, 10, 29
(?<y>\d{4})2025groups.y = 2025
(\w+),\s*(\w+)Smith, AliceSmith, Alice
\b(\w+)\s+\1\bthe the catMatches the the, group 1 is the
(?:https?|ftp):\/\/https://example.comNo groups (non-capturing)
(.)(.)(.)\3\2\1abccbaa, b, c (palindrome of length 6)

FAQ

See also

External reference: the MDN regex groups reference covers JavaScript specifics; test at regex101.com.

TagsRegexCapturing GroupsNamed GroupsBackreferencesRegular ExpressionsJavaScriptPythonPHP
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years across software, Linux systems, DevOps, and infrastructure — and a more recent focus on AI. Currently Chief Technology Officer at a tech startup in the healthcare space.

Keep reading

Related posts

Match integers, decimals, signed, scientific, thousands-separated, currency, and percent numbers with regex. JavaScript / Python / PHP examples, engine notes, common mistakes, test table.

How to Match Numbers with Regex

Match integers, decimals, signed, scientific, thousands-separated, currency, and percent numbers with regex. JavaScript / Python / PHP examples, engine notes, common mistakes, test table.