Parentheses in regex do two things at once. They group a sub-pattern so quantifiers apply to the whole group, and they capture the matched substring so you can pull it out after the match. (\d+) matches one-or-more digits AND lets you grab those digits via match[1] (JavaScript), m.group(1) (Python), or $1 in a replacement string. The variants from there: named groups (?<name>...), non-capturing (?:...), and backreferences \1 or \k<name> that match the SAME text the group captured. Below I walk all four with the most common practical use cases (duplicate-word detection, swapping fields with a replacement, structured parsing), engine notes per language, and the bugs I've shipped.
The reason this feature is everywhere in real regex code: most useful patterns aren't just matching, they're extracting. You match a URL to extract the domain. You match a log line to pull the timestamp. You match a date to capture the month. Capturing groups are how.
Quick reference
| Syntax | Purpose |
|---|---|
(pattern) | Capturing group, numbered left-to-right starting at 1 |
(?:pattern) | Non-capturing group (grouping only) |
(?<name>pattern) | Named capturing group |
\1, \2 | Backreference to a numbered group |
\k<name> | Backreference to a named group |
$1, $2 (replacement) | Insert the captured text into a replacement string |
Basic capturing groups
(\d{4})-(\d{2})-(\d{2})
Three groups: year, month, day. After a match against 2025-10-29, the groups contain:
- Group 0 (the full match):
2025-10-29 - Group 1:
2025 - Group 2:
10 - Group 3:
29
Groups are numbered left-to-right by their opening parenthesis, starting at 1.
Named groups
For complex patterns with many groups, names are easier to read than numbers:
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
In code, access via match.groups.year (JavaScript), m.group("year") (Python), or $matches['year'] (PHP). The numbered access (match[1], etc.) still works in parallel.
Different engines use different syntax for named groups:
| Engine | Named group syntax | Named backreference |
|---|---|---|
| JavaScript | (?<name>...) | \k<name> |
| Python | (?P<name>...) or (?<name>...) (3.12+) | (?P=name) or \k<name> |
| PHP (PCRE) | (?<name>...) or (?P<name>...) | \k<name> or (?P=name) |
| Ruby | (?<name>...) | \k<name> |
| .NET | (?<name>...) | \k<name> |
For cross-engine portability, prefer (?<name>...). It's the most widely accepted form.
Non-capturing groups
If you want grouping (for | alternation or to apply a quantifier) without capturing, prefix with ?::
(?:https?|ftp):\/\/
The group is needed because of the alternation, but you don't care about the captured value separately. Without ?:, this would use up group 1 for https / http / ftp and shift all your subsequent groups by one.
Use non-capturing groups whenever you don't actually need the captured value. It makes intent clearer and avoids polluting the numbered groups.
Backreferences: match the same text again
A backreference matches the same text that an earlier capturing group matched. Syntax: \1, \2, etc. for numbered groups; \k<name> for named groups.
Duplicate-word detection:
\b(\w+)\s+\1\b
This matches the the in a sentence. The (\w+) captures a word, \s+\1 requires whitespace then the SAME word. Useful for editorial scanning.
Same-tag HTML matching:
<(h[1-6])([^>]*)>(.*?)<\/\1>
The \1 ensures the closing tag is the same heading level as the opening tag. Without it, <h1>...</h3> would match (which is invalid HTML).
Backreferences in replacement strings
The same backreferences work in replacement strings for find-and-replace operations:
Find: (\w+) (\w+)
Replace: $2 $1
This swaps two whitespace-separated words. Against Hello World, it produces World Hello. Some engines use \1 and \2 in replacements (PHP, Python's re.sub), others use $1 and $2 (JavaScript). Named-group replacements use $<name> or \g<name> depending on engine.
Practical use cases
Find duplicate consecutive words:
\b(\w+)\s+\1\b
Swap "Lastname, Firstname" to "Firstname Lastname":
Find: (\w+),\s+(\w+)
Replace: $2 $1
Parse "name=value" pairs:
(\w+)=("[^"]*"|'[^']*'|[^\s]+)
The two groups give you the key and the value (with quotes still attached if present).
Wrap HTML hex colour values in <code> tags:
Find: #([0-9A-Fa-f]{3}|[0-9A-Fa-f]{6})\b
Replace: <code>#$1</code>
Convert dates from MM/DD/YYYY to YYYY-MM-DD:
Find: (\d{2})\/(\d{2})\/(\d{4})
Replace: $3-$1-$2
Examples in JavaScript, Python, and PHP
JavaScript:
const text = "Born 1985-10-29, registered 2010-04-15";
// Numbered groups
const isoDate = /(\d{4})-(\d{2})-(\d{2})/g;
let m;
while ((m = isoDate.exec(text)) !== null) {
console.log(`year=${m[1]}, month=${m[2]}, day=${m[3]}`);
}
// Named groups
const named = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/g;
for (const match of text.matchAll(named)) {
console.log(`year=${match.groups.year}, day=${match.groups.day}`);
}
// Swap two words
"Hello World".replace(/(\w+) (\w+)/, "$2 $1"); // 'World Hello'Python:
import re
text = "Born 1985-10-29, registered 2010-04-15"
# Numbered groups
for m in re.finditer(r"(\d{4})-(\d{2})-(\d{2})", text):
print(f"year={m.group(1)}, month={m.group(2)}, day={m.group(3)}")
# Named groups
for m in re.finditer(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", text):
print(f"year={m.group('year')}, day={m.group('day')}")
# Swap two words
re.sub(r"(\w+) (\w+)", r"\2 \1", "Hello World") # 'World Hello'PHP:
$text = "Born 1985-10-29, registered 2010-04-15";
// Numbered groups
preg_match_all('/(\d{4})-(\d{2})-(\d{2})/', $text, $matches);
// $matches[1] = ['1985', '2010'], $matches[2] = ['10', '04'], $matches[3] = ['29', '15']
// Named groups
preg_match_all('/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/', $text, $matches);
// $matches['year'] = ['1985', '2010']
// Swap two words
preg_replace('/(\w+) (\w+)/', '$2 $1', "Hello World"); // 'World Hello'Engine compatibility
Numbered groups work everywhere. Named groups and backreferences have meaningful per-engine quirks.
| Engine | Numbered groups | Named groups | Backreferences | Replacement syntax |
|---|---|---|---|---|
| JavaScript | Works | (?<name>...) (ES2018+) | \1 in pattern, $1 in replacement | $1, $<name> |
Python (re) | Works | (?P<name>...) and (?<name>...) (3.12+) | \1 in pattern, \1 in replacement | \1, \g<name> |
Python (regex pkg) | Works | All forms | Works | All forms |
| PHP (PCRE) | Works | (?<name>...) or (?P<name>...) | \1 in pattern, $1 or \1 in replacement | $1, ${name} |
| Java | Works | (?<name>...) | \1 in pattern, $1 in replacement | $1, ${name} |
| .NET | Works | (?<name>...) | \1 in pattern, $1 in replacement | $1, ${name} |
| Go (RE2) | Works | (?P<name>...) only | Not supported | $1, ${name} |
Rust (regex crate) | Works | (?P<name>...) only | Not supported | $1, ${name} |
| Ruby | Works | (?<name>...) | \1 in pattern, \1 in replacement | \1, \k<name> |
POSIX BRE (sed, grep) | \(...\) only | Not supported | \1 in pattern, \1 in replacement | \1 |
POSIX ERE (grep -E, awk) | (...) | Not supported | Varies (BSD vs GNU) | \1 |
The most important caveat: Go and Rust do not support backreferences in the pattern. So \b(\w+)\s+\1\b does not work in those engines. The workaround is to find all words with a non-backreference pattern, then check adjacent pairs in code.
Common mistakes
The bugs I see most often.
Forgetting that adding a new group renumbers everything to its right. If your replacement was $3-$1-$2 and you add a new capturing group earlier in the pattern, $3 now points at something else. Either use non-capturing groups (?:...) for everything you don't need to extract, or switch to named groups so additions don't reorder anything.
Backslash escape confusion in the replacement string. PHP and Python use \1 in replacements; JavaScript uses $1. If you copy a Python re.sub call to JavaScript and forget to convert, the replacement becomes the literal string \1. Match the syntax to the engine.
Using a backreference where none is captured. (?:foo)\1 does not work because (?:...) is non-capturing, so \1 has no group to refer to. Use a capturing group (foo)\1 if you need the backreference.
Greedy capture across a delimiter. (.*),(.*) against a,b,c captures group 1 as a,b and group 2 as c because .* is greedy. Use ([^,]*),(.*) to capture up to the first comma instead.
Trying to use backreferences in lookbehinds in fixed-width engines. Java, Python's stdlib re, and .NET (until recently) require lookbehinds to be fixed-width. A pattern like (?<=(\w+)) may compile but the backreference width depends on the input. Use the regex package in Python for variable-width lookbehinds, or restructure.
Capturing inside a quantifier and expecting to get all matches. (\w+)+ only captures the LAST iteration's text into group 1. To get all matches, run findall / matchAll / preg_match_all over the unrolled pattern, or split on a delimiter first.
Test cases
| Pattern | Input | Groups |
|---|---|---|
(\d{4})-(\d{2})-(\d{2}) | 2025-10-29 | 2025, 10, 29 |
(?<y>\d{4}) | 2025 | groups.y = 2025 |
(\w+),\s*(\w+) | Smith, Alice | Smith, Alice |
\b(\w+)\s+\1\b | the the cat | Matches the the, group 1 is the |
(?:https?|ftp):\/\/ | https://example.com | No groups (non-capturing) |
(.)(.)(.)\3\2\1 | abccba | a, b, c (palindrome of length 6) |
FAQ
(...) is a capturing group: it groups the sub-pattern AND captures the matched text into a numbered slot you can access later. (?:...) is a non-capturing group: it groups but does not capture.
Use (?:...) when you need grouping (for alternation or quantifier scope) but don't care about the captured value. It makes intent clearer and avoids shifting your numbered groups.
Syntax: (?<name>...). Access via match.groups.name (JavaScript), m.group("name") (Python), or $matches['name'] (PHP).
Python also accepts the older (?P<name>...) syntax. For cross-engine portability prefer the bare (?<name>...) form, which works in modern Python, JavaScript, PHP, Ruby, and .NET.
A backreference matches the same text that an earlier capturing group matched. \1 matches the same text as group 1, \2 for group 2, and \k<name> for named groups.
Example: \b(\w+)\s+\1\b matches a duplicated word like the the because \1 requires the second word to be identical to the first capture.
JavaScript and PHP use $1, $2, etc. in the replacement string. Python uses \1, \2 (or \g<name> for named).
Example to swap two words in JavaScript: "Hello World".replace(/(\w+) (\w+)/, "$2 $1") produces "World Hello".
Every set of parentheses is a capturing group by default and gets a number. When you add a new group in the middle of a pattern, every group to its right shifts by one. What was $3 becomes $4, breaking your replacement.
Fix: use non-capturing groups (?:...) wherever you only need grouping, or switch to named groups so additions don't reorder anything.
Because Go's regexp (RE2) and Rust's regex crate guarantee linear-time matching, and backreferences can make matching exponentially slow. They are excluded by design.
Workaround: capture all matches with a non-backreference pattern, then compare adjacent values in code. For \b(\w+)\s+\1\b, that means finding all words and checking consecutive pairs.
Yes, but the group only captures the branch that actually matched. (foo)|(bar) against foo captures group 1 as foo and group 2 as undefined/None. Against bar, it's the opposite.
If you don't care which branch matched and just want "either", wrap with a single capturing group: (foo|bar). Group 1 then holds whichever matched.
See also
- Regex Lookaheads and Lookbehinds: the zero-width assertions that constrain where captures can happen
- How to Match HTML Tags with Regex: the backreference pattern in action for matching opening/closing tag pairs
- How to Match a Date with Regex: the
\1separator trick for consistent date separators - How to Match an Email Address with Regex: extracting the local part and domain into separate groups
- Regex Anchors: the partner concept for tokenizing input
- Regex Word Boundaries: often paired with
\b(\w+)\bfor capturing whole words - Validate Password Strength with Regex: backreferences are how the strong-password pattern rejects repeated runs like
aaaand111 - Regex Cheat Sheet: the wider syntax and engine compatibility reference
- The interview question that cost me the job: a war story where Go's RE2 engine dropping backreferences (the feature above) shaped a take-home test, and the bigger lesson about asking what a question really means
External reference: the MDN regex groups reference covers JavaScript specifics; test at regex101.com.
Recommended books
Groups and backreferences are the gateway from matching into real text processing. For the full treatment:
- Regular Expressions Cookbook (Jan Goyvaerts and Steven Levithan, 2nd edition). Problem-then-solution recipes across eight languages (JavaScript, Python, PHP, Java, .NET, Ruby, Perl, VB). The one to keep next to the keyboard.
- Mastering Regular Expressions (Jeffrey Friedl, 3rd edition). The definitive deep-dive on how regex engines actually work: backtracking, NFA versus DFA, and the optimisation that makes a pattern fast or catastrophic. Dense, and unmatched once you are past the basics.
- Learning Regular Expressions (Ben Forta). The gentlest on-ramp: short, current, and example-driven. A good first book if you are still finding your feet.





