TechEarl

How to Use Regex Lookaheads and Lookbehinds

Regex lookaheads and lookbehinds assert what comes before or after a match without consuming characters. Full reference with syntax, password validation, variable-width vs fixed-width support per engine, and examples in JavaScript, Python, PHP, Go, Java, .NET.

Ishan Karunaratne⏱️ 5 min readUpdated
Share thisCopied
Regex lookaheads and lookbehinds assert what comes before or after a match without consuming characters. Full reference with syntax, password validation, variable-width vs fixed-width support per engine, and examples in JavaScript, Python, PHP, Go, Java, .NET.

Lookahead (?=...) asserts that what follows the current position matches without consuming any characters. Lookbehind (?<=...) asserts that what precedes the current position matches, also without consuming. Their negatives (?!...) and (?<!...) assert the opposite: that what follows or precedes does NOT match. These four "zero-width assertions" turn regex from a search tool into a context-aware matcher: "find every digit followed by kg, but don't include the kg in the match", or "match a password that contains at least one digit and one uppercase letter, without saying where they are".

The reason most regex tutorials introduce these last is that they sit on top of basic matching as a layer. Once you have them, you can write patterns that would otherwise need multiple regex passes or post-processing.

Jump to:

Lookahead: match X only if followed by Y

Syntax: X(?=Y) matches X only if Y follows immediately after, but the match returned is X (the lookahead consumes nothing).

Example: extract numeric weights followed by kg:

code
\d+(?=kg)

Against input 42kg, 7lb, 150kg, this matches 42 and 150 (the 7 is rejected because lb follows, not kg). The kg itself is not in the matched string.

The classic where-this-is-useful case: extract a substring that's adjacent to a marker without capturing the marker. Without lookahead, you'd capture the marker and then strip it in post-processing.

Lookbehind: match X only if preceded by Y

Syntax: (?<=Y)X matches X only if Y immediately precedes it. Again, Y is not in the returned match.

Example: extract prices written with a leading dollar sign:

code
(?<=\$)\d+(\.\d{2})?

Against $42, €15, $99.50, this matches 42 and 99.50 but not 15 (it's preceded by , not $). The $ is not in the result.

Negative lookahead and lookbehind

(?!...) and (?<!...) invert the assertion. They succeed when the pattern does NOT match.

Example: match digits NOT followed by a percent sign:

code
\d+(?!%)

Against 25%, 30, 40%, this matches 30 (25 is followed by %, so the lookahead fails; 40 is followed by %, same).

Negative lookbehind example: capitalised words NOT preceded by Mr.:

code
(?<!Mr\. )[A-Z][a-z]+

Against Mr. Smith and Alice met John, this matches Alice and John, but not Smith (because Mr. precedes it).

The password-validation use case

The classic real-world use case for lookahead: enforce that a password contains at least one digit AND at least one uppercase letter AND at least one special character AND is at least 8 characters. You can't easily do this with a positional regex because the order of those character types is unconstrained.

With lookaheads:

code
^(?=.*[0-9])(?=.*[A-Z])(?=.*[!@#$%^&*])[A-Za-z0-9!@#$%^&*]{8,}$

Each (?=...) is a zero-width assertion that anchors back to the start of the string (because of the ^). They all check the same range. The actual match ([A-Za-z0-9!@#$%^&*]{8,}$) is what consumes the characters and ensures the right length and character set.

This pattern is the foundation of every "strong password regex". The password strength validation walkthrough covers tighter variants and the cases where regex stops being the right tool.

Examples in JavaScript, Python, and PHP

JavaScript:

javascript
// Match numbers followed by "kg" (without including kg)
const weights = "42kg, 7lb, 150kg".match(/\d+(?=kg)/g);
// ['42', '150']

// Match prices preceded by $ (without including $)
const prices = "$42, €15, $99.50".match(/(?<=\$)\d+(?:\.\d{2})?/g);
// ['42', '99.50']

// Password validation
const strongPassword = /^(?=.*[0-9])(?=.*[A-Z])(?=.*[!@#$%^&*])[A-Za-z0-9!@#$%^&*]{8,}$/;
strongPassword.test("Hello123!");   // true
strongPassword.test("hello123!");   // false (no uppercase)

Python:

python
import re

# Numbers followed by kg
weights = re.findall(r"\d+(?=kg)", "42kg, 7lb, 150kg")
# ['42', '150']

# Prices preceded by $
prices = re.findall(r"(?<=\$)\d+(?:\.\d{2})?", "$42, €15, $99.50")
# ['42', '99.50']

# Password validation
strong = re.compile(r"^(?=.*[0-9])(?=.*[A-Z])(?=.*[!@#$%^&*])[A-Za-z0-9!@#$%^&*]{8,}$")
bool(strong.match("Hello123!"))  # True

PHP:

php
// Numbers followed by kg
preg_match_all('/\d+(?=kg)/', "42kg, 7lb, 150kg", $matches);
// $matches[0] = ['42', '150']

// Prices preceded by $
preg_match_all('/(?<=\$)\d+(?:\.\d{2})?/', "\$42, €15, \$99.50", $matches);
// $matches[0] = ['42', '99.50']

// Password validation
$strong = '/^(?=.*[0-9])(?=.*[A-Z])(?=.*[!@#$%^&*])[A-Za-z0-9!@#$%^&*]{8,}$/';
preg_match($strong, "Hello123!");  // 1 (match)

Engine compatibility (especially JavaScript lookbehind)

Lookahead (?=...) and negative lookahead (?!...) are supported in every modern regex engine that supports assertions: JavaScript, Python, PHP/PCRE, Java, .NET, Ruby, and Rust. Go's standard-library regexp does NOT support lookarounds at all because RE2 omits them to guarantee linear-time matching.

Lookbehind (?<=...) and negative lookbehind (?<!...) had patchier historical support:

EngineLookaheadLookbehindVariable-width lookbehind
JavaScript (V8)All versionsES2018+ (Chrome 62, Firefox 78, Safari 16.4, Node 10)Yes, since ES2018
Python (re)All versionsAll versionsNo, fixed-width only at every Python version including 3.12. Use the third-party regex package for variable width
Python (regex package)All versionsAll versionsYes
PCREAll versionsAll versionsPCRE1 fixed-width only; PCRE2 (PHP 7.3+) supports variable width
JavaAll versionsAll versionsLimited (alternation of fixed widths) since Java 9+
.NETAll versionsAll versionsYes, always (the only engine that's had variable-width since v1)
Ruby (Onigmo)All versions1.9+Yes
Rust (regex crate)All versionsNon/a (no lookbehind at all)
Go (regexp, RE2)NoNon/a (RE2 omits all lookaround)

In Go specifically, if you need lookbehind, use the third-party github.com/dlclark/regexp2 package which implements the .NET regex flavor and supports the full feature set. Same advice applies in Rust: the stdlib-style regex crate is linear-time RE2-style; use fancy-regex or regress if you need lookarounds.

Variable-width vs fixed-width lookbehind

The most common cross-engine surprise. Fixed-width lookbehind means the assertion has to match a specific number of characters. (?<=abc) is fixed-width 3. (?<=ab|cd) is fixed-width 2. (?<=a*) or (?<=https?:\/\/) is variable-width because the matched length can vary.

The breakdown:

  • JavaScript V8 (since ES2018): variable-width lookbehind is supported. (?<=https?:\/\/)\w+ works. This was a deliberate spec choice and one of the headline ES2018 regex features.
  • Python re: fixed-width only. (?<=https?:\/\/)\w+ raises error: look-behind requires fixed-width pattern. The workaround is to use alternation of fixed-width options, (?<=http:\/\/|https:\/\/)\w+, which Python accepts because each alternative is itself fixed-width.
  • Python regex package: variable-width supported. Drop-in replacement for re when you need this.
  • PCRE2 (PHP 7.3+): supports variable-width lookbehind. PCRE1 (older PHP) is fixed-width.
  • Java: limited variable-width support since Java 9. The engine accepts alternation of fixed-widths and capped quantifiers ({0,5}), but rejects unbounded ones (*, +, {0,}).
  • .NET: variable-width lookbehind has always worked.

If you're writing cross-engine regex, prefer fixed-width or alternation-of-fixed-width lookbehinds. That's the only form that compiles everywhere.

Common mistakes

Mistake 1: capturing inside a lookaround and expecting it in the result. A lookahead like (?=(\d+)) does capture the digits into group 1, but the lookaround itself contributes zero characters to the main match. If you only check match[0], you won't see them. Use the explicit capture group reference (match[1]) or restructure the pattern.

Mistake 2: writing impossible lookarounds. q(?=u)i can never match because it asks the engine to match i at the same position where u was asserted to follow. The lookahead consumed no characters, so after the assertion succeeds the engine is still trying to match i at the u position. Always write the assertion at the position you actually want.

Mistake 3: variable-width lookbehind in Python re. Python's stdlib re will refuse to compile (?<=https?:\/\/)\w+ and throw error: look-behind requires fixed-width pattern. Either rewrite as alternation of fixed-widths ((?<=http:\/\/|https:\/\/)) or switch to the third-party regex package.

Mistake 4: using lookarounds in Go. Go's regexp package will not even compile patterns with (?=, (?<=, (?!, or (?<!. The compile call returns an error rather than failing silently. Reach for the third-party regexp2 package or restructure the pattern to avoid lookarounds.

Mistake 5: assuming the password pattern enforces character order. ^(?=.*[0-9])(?=.*[A-Z])... says "the string must contain at least one digit AND at least one uppercase letter somewhere". It does NOT enforce that the digit comes before the uppercase letter, or any other order. Each lookahead is an independent zero-width assertion anchored at the start.

Mistake 6: forgetting that lookarounds are zero-width. Stacking lookaheads at the same position is fine and is the foundation of the password-validation pattern. Stacking lookbehinds at the same position is also fine. Mixing them with consuming patterns at the same position requires the engine to back up and re-try, which is the right behavior but worth knowing for performance.

What to do next

For the regex features that pair most naturally with lookarounds:

  • Regex Anchors: ^, $, \A, \z, the line-and-string boundary assertions you'll use alongside lookarounds for full-input validation.
  • Regex Word Boundaries: \b and \B, the other zero-width assertion family. Lookarounds are the variable-width alternative when \b isn't precise enough.
  • Regex Capturing Groups and Backreferences: parentheses you use to capture and reference parts of a match, which combine with lookarounds to write surprisingly capable patterns.
  • Validate Password Strength with Regex: the production application of the lookahead-stacking pattern shown above.

For specific real-world patterns that lean heavily on lookarounds:

For the wider regex syntax reference, see the Regex Cheat Sheet.

And for a war story about exactly this trade-off, RE2-style engines like Go's dropping lookarounds in exchange for linear-time matching: the interview question that cost me the job, where a Go log-parsing take-home turned on knowing which patterns Go's regexp will and won't compile.

External reference: the MDN regex assertions page covers JavaScript's implementation; test interactively at regex101.com (set the flavor to PCRE2 or JavaScript depending on your target).

FAQ

Lookahead (?=...) asserts that a pattern follows the current position; lookbehind (?<=...) asserts that a pattern precedes it. Neither consumes characters; they only assert.

Use lookahead when you want to match X but only when Y comes after, without capturing Y. Use lookbehind for the opposite: match X only when Y comes before.

Positive lookahead (?=Y) succeeds when Y matches at the current position. Negative lookahead (?!Y) succeeds when Y does NOT match. Same idea applies to lookbehind: (?<=Y) requires Y before, (?<!Y) requires Y to NOT be before.

Common use: positive lookahead to find tokens adjacent to a marker without capturing the marker. Negative lookahead to find tokens that explicitly avoid a context (e.g., \d+(?!%) for numbers not followed by a percent sign).

Yes, since ES2018. Browser support: Chrome 62+, Firefox 78+, Safari 16.4+, Node.js 10+. Older runtimes throw a SyntaxError at regex-compile time.

JavaScript's lookbehind also supports variable-width patterns since ES2018, which is unusual: most other engines require fixed-width or limited alternation. If you need to support pre-2018 browsers, restructure the pattern using non-capturing groups and post-processing, or wrap the regex construction in try/catch.

Python's stdlib re module requires fixed-width lookbehind at every version including 3.12. Patterns like (?<=https?:\/\/) are rejected because the alternation can match 7 or 8 characters depending on which side wins.

Two fixes. Rewrite as alternation of fixed-widths: (?<=http:\/\/|https:\/\/)\w+ (each branch is now fixed). Or install the third-party regex package, which is a drop-in replacement for re with full variable-width lookbehind support.

Not in the standard library's regexp package, which uses RE2 and omits all backreferences and lookarounds to guarantee linear-time matching.

If you need lookbehind in Go, use the third-party regexp2 package, which implements the .NET regex flavor and supports the full feature set. Same situation in Rust: the stdlib regex crate is RE2-style and has no lookarounds; use fancy-regex for the full PCRE-style feature set.

An assertion that matches a position rather than characters. Anchors (^, $, \b) and lookarounds are all zero-width: they constrain where a match can happen but contribute no characters to the matched text.

The practical implication is that you can stack them without affecting the match's length or position. Multiple lookaheads at the start of a pattern (like in password validation) all check the same range from the same anchor point.

Use negative lookbehind: (?<!preceding\s)word. In engines that require fixed-width lookbehind (Python stdlib re, older PCRE), the inside of the lookbehind has to match an exact length. In engines that support variable-width (JavaScript V8, .NET, PCRE2, the Python regex package), the lookbehind can match any width.

If your engine has fixed-width-only lookbehind and you need variable width, restructure as a positive match plus filtering in code rather than fighting the regex.

Yes. Capturing groups inside a lookaround do capture text into numbered groups, even though the lookaround itself contributes nothing to the main match. A pattern like q(?=(\d+)) matches a literal q at position 0 (length 1), but group 1 captures the digits that follow.

The common bug is checking match[0] and expecting the captured digits there. They're in match[1]. This is the technique people use to "match" a length-zero context and pull out a substring that wasn't consumed.

Lookarounds are the most powerful and most misunderstood part of regex. The books that explain them properly:

  • Mastering Regular Expressions (Jeffrey Friedl, 3rd edition). The definitive deep-dive on how regex engines actually work: backtracking, NFA versus DFA, and the optimisation that makes a pattern fast or catastrophic. Dense, and unmatched once you are past the basics.
  • Regular Expressions Cookbook (Jan Goyvaerts and Steven Levithan, 2nd edition). Problem-then-solution recipes across eight languages (JavaScript, Python, PHP, Java, .NET, Ruby, Perl, VB). The one to keep next to the keyboard.

Sources

Authoritative references this article was fact-checked against.

TagsRegexLookaheadLookbehindAssertionsZero-Width AssertionsRegular ExpressionsJavaScriptPythonPHPPCRE

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Software Systems Architect · Senior Software Engineer · Engineering Leadership

Software systems architect and senior software engineer with more than two decades designing, building, and running production software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Now a CTO, though what I write here is drawn from the full arc of that work, across architecture, engineering, and operations, not any single job.

Keep reading

Related posts

Using regex in Nginx with location blocks and the rewrite directive: location modifier priority, the rewrite directive flags, return-based redirects, and copy-paste config for HTTPS redirects, www normalization, trailing slashes, 301 redirects, clean URLs, and blocking by user-agent or IP.

How to Use Regex in Nginx (location and rewrite)

Use regex in Nginx with location blocks and the rewrite directive: how location modifiers and matching priority work, why return beats rewrite for redirects, and copy-paste config for HTTPS, www, trailing slashes, 301s, clean URLs, and access blocking.

Using regex in Apache .htaccess with mod_rewrite: RewriteRule and RewriteCond pattern syntax, rewrite flags, and copy-paste rules for HTTPS redirects, www normalization, trailing slashes, 301 redirects, clean URLs, and blocking by user-agent or IP.

How to Use Regex in .htaccess (Apache mod_rewrite)

Use regex in .htaccess with Apache mod_rewrite: how RewriteRule and RewriteCond patterns work, the per-directory quirk that breaks everyone, and copy-paste rules for HTTPS, www, trailing slashes, 301s, clean URLs, and access blocking.