Parentheses in regex do two things at once. They group a sub-pattern so quantifiers apply to the whole group, and they capture the matched substring so you can pull it out after the match. (\d+) matches one-or-more digits AND lets you grab those digits via match[1] (JavaScript), m.group(1) (Python), or $1 in a replacement string. The variants from there: named groups (?<name>...), non-capturing (?:...), and backreferences \1 or \k<name> that match the SAME text the group captured. Below I walk all four with the most common practical use cases (duplicate-word detection, swapping fields with a replacement, structured parsing), engine notes per language, and the bugs I've shipped.
The reason this feature is everywhere in real regex code: most useful patterns aren't just matching, they're extracting. You match a URL to extract the domain. You match a log line to pull the timestamp. You match a date to capture the month. Capturing groups are how.
Quick reference
| Syntax | Purpose |
|---|---|
(pattern) | Capturing group, numbered left-to-right starting at 1 |
(?:pattern) | Non-capturing group (grouping only) |
(?<name>pattern) | Named capturing group |
\1, \2 | Backreference to a numbered group |
\k<name> | Backreference to a named group |
$1, $2 (replacement) | Insert the captured text into a replacement string |
Basic capturing groups
(\d{4})-(\d{2})-(\d{2})
Three groups: year, month, day. After a match against 2025-10-29, the groups contain:
- Group 0 (the full match):
2025-10-29 - Group 1:
2025 - Group 2:
10 - Group 3:
29
Groups are numbered left-to-right by their opening parenthesis, starting at 1.
Named groups
For complex patterns with many groups, names are easier to read than numbers:
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
In code, access via match.groups.year (JavaScript), m.group("year") (Python), or $matches['year'] (PHP). The numbered access (match[1], etc.) still works in parallel.
Different engines use different syntax for named groups:
| Engine | Named group syntax | Named backreference |
|---|---|---|
| JavaScript | (?<name>...) | \k<name> |
| Python | (?P<name>...) or (?<name>...) (3.12+) | (?P=name) or \k<name> |
| PHP (PCRE) | (?<name>...) or (?P<name>...) | \k<name> or (?P=name) |
| Ruby | (?<name>...) | \k<name> |
| .NET | (?<name>...) | \k<name> |
For cross-engine portability, prefer (?<name>...). It's the most widely accepted form.
Non-capturing groups
If you want grouping (for | alternation or to apply a quantifier) without capturing, prefix with ?::
(?:https?|ftp):\/\/
The group is needed because of the alternation, but you don't care about the captured value separately. Without ?:, this would use up group 1 for https / http / ftp and shift all your subsequent groups by one.
Use non-capturing groups whenever you don't actually need the captured value. It makes intent clearer and avoids polluting the numbered groups.
Backreferences: match the same text again
A backreference matches the same text that an earlier capturing group matched. Syntax: \1, \2, etc. for numbered groups; \k<name> for named groups.
Duplicate-word detection:
\b(\w+)\s+\1\b
This matches the the in a sentence. The (\w+) captures a word, \s+\1 requires whitespace then the SAME word. Useful for editorial scanning.
Same-tag HTML matching:
<(h[1-6])([^>]*)>(.*?)<\/\1>
The \1 ensures the closing tag is the same heading level as the opening tag. Without it, <h1>...</h3> would match (which is invalid HTML).
Backreferences in replacement strings
The same backreferences work in replacement strings for find-and-replace operations:
Find: (\w+) (\w+)
Replace: $2 $1
This swaps two whitespace-separated words. Against Hello World, it produces World Hello. Some engines use \1 and \2 in replacements (PHP, Python's re.sub), others use $1 and $2 (JavaScript). Named-group replacements use $<name> or \g<name> depending on engine.
Practical use cases
Find duplicate consecutive words:
\b(\w+)\s+\1\b
Swap "Lastname, Firstname" to "Firstname Lastname":
Find: (\w+),\s+(\w+)
Replace: $2 $1
Parse "name=value" pairs:
(\w+)=("[^"]*"|'[^']*'|[^\s]+)
The two groups give you the key and the value (with quotes still attached if present).
Wrap HTML hex colour values in <code> tags:
Find: #([0-9A-Fa-f]{3}|[0-9A-Fa-f]{6})\b
Replace: <code>#$1</code>
Convert dates from MM/DD/YYYY to YYYY-MM-DD:
Find: (\d{2})\/(\d{2})\/(\d{4})
Replace: $3-$1-$2
Examples in JavaScript, Python, and PHP
JavaScript:
const text = "Born 1985-10-29, registered 2010-04-15";
// Numbered groups
const isoDate = /(\d{4})-(\d{2})-(\d{2})/g;
let m;
while ((m = isoDate.exec(text)) !== null) {
console.log(`year=${m[1]}, month=${m[2]}, day=${m[3]}`);
}
// Named groups
const named = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/g;
for (const match of text.matchAll(named)) {
console.log(`year=${match.groups.year}, day=${match.groups.day}`);
}
// Swap two words
"Hello World".replace(/(\w+) (\w+)/, "$2 $1"); // 'World Hello'Python:
import re
text = "Born 1985-10-29, registered 2010-04-15"
# Numbered groups
for m in re.finditer(r"(\d{4})-(\d{2})-(\d{2})", text):
print(f"year={m.group(1)}, month={m.group(2)}, day={m.group(3)}")
# Named groups
for m in re.finditer(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", text):
print(f"year={m.group('year')}, day={m.group('day')}")
# Swap two words
re.sub(r"(\w+) (\w+)", r"\2 \1", "Hello World") # 'World Hello'PHP:
$text = "Born 1985-10-29, registered 2010-04-15";
// Numbered groups
preg_match_all('/(\d{4})-(\d{2})-(\d{2})/', $text, $matches);
// $matches[1] = ['1985', '2010'], $matches[2] = ['10', '04'], $matches[3] = ['29', '15']
// Named groups
preg_match_all('/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/', $text, $matches);
// $matches['year'] = ['1985', '2010']
// Swap two words
preg_replace('/(\w+) (\w+)/', '$2 $1', "Hello World"); // 'World Hello'Engine compatibility
Numbered groups work everywhere. Named groups and backreferences have meaningful per-engine quirks.
| Engine | Numbered groups | Named groups | Backreferences | Replacement syntax |
|---|---|---|---|---|
| JavaScript | Works | (?<name>...) (ES2018+) | \1 in pattern, $1 in replacement | $1, $<name> |
Python (re) | Works | (?P<name>...) and (?<name>...) (3.12+) | \1 in pattern, \1 in replacement | \1, \g<name> |
Python (regex pkg) | Works | All forms | Works | All forms |
| PHP (PCRE) | Works | (?<name>...) or (?P<name>...) | \1 in pattern, $1 or \1 in replacement | $1, ${name} |
| Java | Works | (?<name>...) | \1 in pattern, $1 in replacement | $1, ${name} |
| .NET | Works | (?<name>...) | \1 in pattern, $1 in replacement | $1, ${name} |
| Go (RE2) | Works | (?P<name>...) only | Not supported | $1, ${name} |
Rust (regex crate) | Works | (?P<name>...) only | Not supported | $1, ${name} |
| Ruby | Works | (?<name>...) | \1 in pattern, \1 in replacement | \1, \k<name> |
POSIX BRE (sed, grep) | \(...\) only | Not supported | \1 in pattern, \1 in replacement | \1 |
POSIX ERE (grep -E, awk) | (...) | Not supported | Varies (BSD vs GNU) | \1 |
The most important caveat: Go and Rust do not support backreferences in the pattern. So \b(\w+)\s+\1\b does not work in those engines. The workaround is to find all words with a non-backreference pattern, then check adjacent pairs in code.
Common mistakes
The bugs I see most often.
Forgetting that adding a new group renumbers everything to its right. If your replacement was $3-$1-$2 and you add a new capturing group earlier in the pattern, $3 now points at something else. Either use non-capturing groups (?:...) for everything you don't need to extract, or switch to named groups so additions don't reorder anything.
Backslash escape confusion in the replacement string. PHP and Python use \1 in replacements; JavaScript uses $1. If you copy a Python re.sub call to JavaScript and forget to convert, the replacement becomes the literal string \1. Match the syntax to the engine.
Using a backreference where none is captured. (?:foo)\1 does not work because (?:...) is non-capturing, so \1 has no group to refer to. Use a capturing group (foo)\1 if you need the backreference.
Greedy capture across a delimiter. (.*),(.*) against a,b,c captures group 1 as a,b and group 2 as c because .* is greedy. Use ([^,]*),(.*) to capture up to the first comma instead.
Trying to use backreferences in lookbehinds in fixed-width engines. Java, Python's stdlib re, and .NET (until recently) require lookbehinds to be fixed-width. A pattern like (?<=(\w+)) may compile but the backreference width depends on the input. Use the regex package in Python for variable-width lookbehinds, or restructure.
Capturing inside a quantifier and expecting to get all matches. (\w+)+ only captures the LAST iteration's text into group 1. To get all matches, run findall / matchAll / preg_match_all over the unrolled pattern, or split on a delimiter first.
Test cases
| Pattern | Input | Groups |
|---|---|---|
(\d{4})-(\d{2})-(\d{2}) | 2025-10-29 | 2025, 10, 29 |
(?<y>\d{4}) | 2025 | groups.y = 2025 |
(\w+),\s*(\w+) | Smith, Alice | Smith, Alice |
\b(\w+)\s+\1\b | the the cat | Matches the the, group 1 is the |
(?:https?|ftp):\/\/ | https://example.com | No groups (non-capturing) |
(.)(.)(.)\3\2\1 | abccba | a, b, c (palindrome of length 6) |
FAQ
See also
- Regex Lookaheads and Lookbehinds: the zero-width assertions that constrain where captures can happen
- How to Match HTML Tags with Regex: the backreference pattern in action for matching opening/closing tag pairs
- How to Match a Date with Regex: the
\1separator trick for consistent date separators - How to Match an Email Address with Regex: extracting the local part and domain into separate groups
- Regex Anchors: the partner concept for tokenizing input
- Regex Word Boundaries: often paired with
\b(\w+)\bfor capturing whole words - Validate Password Strength with Regex: backreferences are how the strong-password pattern rejects repeated runs like
aaaand111 - Regex Cheat Sheet: the wider syntax and engine compatibility reference
External reference: the MDN regex groups reference covers JavaScript specifics; test at regex101.com.





