TechEarl

How to Use Regex Lookaheads and Lookbehinds

Regex lookaheads and lookbehinds assert what comes before or after a match without consuming characters. Full reference with syntax, password validation, variable-width vs fixed-width support per engine, and examples in JavaScript, Python, PHP, Go, Java, .NET.

Ishan KarunaratneIshan Karunaratne⏱️ 4 min readUpdated
Regex lookaheads and lookbehinds assert what comes before or after a match without consuming characters. Full reference with syntax, password validation, variable-width vs fixed-width support per engine, and examples in JavaScript, Python, PHP, Go, Java, .NET.

Lookahead (?=...) asserts that what follows the current position matches without consuming any characters. Lookbehind (?<=...) asserts that what precedes the current position matches, also without consuming. Their negatives (?!...) and (?<!...) assert the opposite: that what follows or precedes does NOT match. These four "zero-width assertions" turn regex from a search tool into a context-aware matcher: "find every digit followed by kg, but don't include the kg in the match", or "match a password that contains at least one digit and one uppercase letter, without saying where they are".

The reason most regex tutorials introduce these last is that they sit on top of basic matching as a layer. Once you have them, you can write patterns that would otherwise need multiple regex passes or post-processing.

Jump to:

Lookahead: match X only if followed by Y

Syntax: X(?=Y) matches X only if Y follows immediately after, but the match returned is X (the lookahead consumes nothing).

Example: extract numeric weights followed by kg:

code
\d+(?=kg)

Against input 42kg, 7lb, 150kg, this matches 42 and 150 (the 7 is rejected because lb follows, not kg). The kg itself is not in the matched string.

The classic where-this-is-useful case: extract a substring that's adjacent to a marker without capturing the marker. Without lookahead, you'd capture the marker and then strip it in post-processing.

Lookbehind: match X only if preceded by Y

Syntax: (?<=Y)X matches X only if Y immediately precedes it. Again, Y is not in the returned match.

Example: extract prices written with a leading dollar sign:

code
(?<=\$)\d+(\.\d{2})?

Against $42, €15, $99.50, this matches 42 and 99.50 but not 15 (it's preceded by , not $). The $ is not in the result.

Negative lookahead and lookbehind

(?!...) and (?<!...) invert the assertion. They succeed when the pattern does NOT match.

Example: match digits NOT followed by a percent sign:

code
\d+(?!%)

Against 25%, 30, 40%, this matches 30 (25 is followed by %, so the lookahead fails; 40 is followed by %, same).

Negative lookbehind example: capitalised words NOT preceded by Mr.:

code
(?<!Mr\. )[A-Z][a-z]+

Against Mr. Smith and Alice met John, this matches Alice and John, but not Smith (because Mr. precedes it).

The password-validation use case

The classic real-world use case for lookahead: enforce that a password contains at least one digit AND at least one uppercase letter AND at least one special character AND is at least 8 characters. You can't easily do this with a positional regex because the order of those character types is unconstrained.

With lookaheads:

code
^(?=.*[0-9])(?=.*[A-Z])(?=.*[!@#$%^&*])[A-Za-z0-9!@#$%^&*]{8,}$

Each (?=...) is a zero-width assertion that anchors back to the start of the string (because of the ^). They all check the same range. The actual match ([A-Za-z0-9!@#$%^&*]{8,}$) is what consumes the characters and ensures the right length and character set.

This pattern is the foundation of every "strong password regex". The password strength validation walkthrough covers tighter variants and the cases where regex stops being the right tool.

Examples in JavaScript, Python, and PHP

JavaScript:

javascript
// Match numbers followed by "kg" (without including kg)
const weights = "42kg, 7lb, 150kg".match(/\d+(?=kg)/g);
// ['42', '150']

// Match prices preceded by $ (without including $)
const prices = "$42, €15, $99.50".match(/(?<=\$)\d+(?:\.\d{2})?/g);
// ['42', '99.50']

// Password validation
const strongPassword = /^(?=.*[0-9])(?=.*[A-Z])(?=.*[!@#$%^&*])[A-Za-z0-9!@#$%^&*]{8,}$/;
strongPassword.test("Hello123!");   // true
strongPassword.test("hello123!");   // false (no uppercase)

Python:

python
import re

# Numbers followed by kg
weights = re.findall(r"\d+(?=kg)", "42kg, 7lb, 150kg")
# ['42', '150']

# Prices preceded by $
prices = re.findall(r"(?<=\$)\d+(?:\.\d{2})?", "$42, €15, $99.50")
# ['42', '99.50']

# Password validation
strong = re.compile(r"^(?=.*[0-9])(?=.*[A-Z])(?=.*[!@#$%^&*])[A-Za-z0-9!@#$%^&*]{8,}$")
bool(strong.match("Hello123!"))  # True

PHP:

php
// Numbers followed by kg
preg_match_all('/\d+(?=kg)/', "42kg, 7lb, 150kg", $matches);
// $matches[0] = ['42', '150']

// Prices preceded by $
preg_match_all('/(?<=\$)\d+(?:\.\d{2})?/', "\$42, €15, \$99.50", $matches);
// $matches[0] = ['42', '99.50']

// Password validation
$strong = '/^(?=.*[0-9])(?=.*[A-Z])(?=.*[!@#$%^&*])[A-Za-z0-9!@#$%^&*]{8,}$/';
preg_match($strong, "Hello123!");  // 1 (match)

Engine compatibility (especially JavaScript lookbehind)

Lookahead (?=...) and negative lookahead (?!...) are supported in every modern regex engine that supports assertions: JavaScript, Python, PHP/PCRE, Java, .NET, Ruby, and Rust. Go's standard-library regexp does NOT support lookarounds at all because RE2 omits them to guarantee linear-time matching.

Lookbehind (?<=...) and negative lookbehind (?<!...) had patchier historical support:

EngineLookaheadLookbehindVariable-width lookbehind
JavaScript (V8)All versionsES2018+ (Chrome 62, Firefox 78, Safari 16.4, Node 10)Yes, since ES2018
Python (re)All versionsAll versionsNo, fixed-width only at every Python version including 3.12. Use the third-party regex package for variable width
Python (regex package)All versionsAll versionsYes
PCREAll versionsAll versionsPCRE1 fixed-width only; PCRE2 (PHP 7.3+) supports variable width
JavaAll versionsAll versionsLimited (alternation of fixed widths) since Java 9+
.NETAll versionsAll versionsYes, always (the only engine that's had variable-width since v1)
Ruby (Onigmo)All versions1.9+Yes
Rust (regex crate)All versionsNon/a (no lookbehind at all)
Go (regexp, RE2)NoNon/a (RE2 omits all lookaround)

In Go specifically, if you need lookbehind, use the third-party github.com/dlclark/regexp2 package which implements the .NET regex flavor and supports the full feature set. Same advice applies in Rust: the stdlib-style regex crate is linear-time RE2-style; use fancy-regex or regress if you need lookarounds.

Variable-width vs fixed-width lookbehind

The most common cross-engine surprise. Fixed-width lookbehind means the assertion has to match a specific number of characters. (?<=abc) is fixed-width 3. (?<=ab|cd) is fixed-width 2. (?<=a*) or (?<=https?:\/\/) is variable-width because the matched length can vary.

The breakdown:

  • JavaScript V8 (since ES2018): variable-width lookbehind is supported. (?<=https?:\/\/)\w+ works. This was a deliberate spec choice and one of the headline ES2018 regex features.
  • Python re: fixed-width only. (?<=https?:\/\/)\w+ raises error: look-behind requires fixed-width pattern. The workaround is to use alternation of fixed-width options, (?<=http:\/\/|https:\/\/)\w+, which Python accepts because each alternative is itself fixed-width.
  • Python regex package: variable-width supported. Drop-in replacement for re when you need this.
  • PCRE2 (PHP 7.3+): supports variable-width lookbehind. PCRE1 (older PHP) is fixed-width.
  • Java: limited variable-width support since Java 9. The engine accepts alternation of fixed-widths and capped quantifiers ({0,5}), but rejects unbounded ones (*, +, {0,}).
  • .NET: variable-width lookbehind has always worked.

If you're writing cross-engine regex, prefer fixed-width or alternation-of-fixed-width lookbehinds. That's the only form that compiles everywhere.

Common mistakes

Mistake 1: capturing inside a lookaround and expecting it in the result. A lookahead like (?=(\d+)) does capture the digits into group 1, but the lookaround itself contributes zero characters to the main match. If you only check match[0], you won't see them. Use the explicit capture group reference (match[1]) or restructure the pattern.

Mistake 2: writing impossible lookarounds. q(?=u)i can never match because it asks the engine to match i at the same position where u was asserted to follow. The lookahead consumed no characters, so after the assertion succeeds the engine is still trying to match i at the u position. Always write the assertion at the position you actually want.

Mistake 3: variable-width lookbehind in Python re. Python's stdlib re will refuse to compile (?<=https?:\/\/)\w+ and throw error: look-behind requires fixed-width pattern. Either rewrite as alternation of fixed-widths ((?<=http:\/\/|https:\/\/)) or switch to the third-party regex package.

Mistake 4: using lookarounds in Go. Go's regexp package will not even compile patterns with (?=, (?<=, (?!, or (?<!. The compile call returns an error rather than failing silently. Reach for the third-party regexp2 package or restructure the pattern to avoid lookarounds.

Mistake 5: assuming the password pattern enforces character order. ^(?=.*[0-9])(?=.*[A-Z])... says "the string must contain at least one digit AND at least one uppercase letter somewhere". It does NOT enforce that the digit comes before the uppercase letter, or any other order. Each lookahead is an independent zero-width assertion anchored at the start.

Mistake 6: forgetting that lookarounds are zero-width. Stacking lookaheads at the same position is fine and is the foundation of the password-validation pattern. Stacking lookbehinds at the same position is also fine. Mixing them with consuming patterns at the same position requires the engine to back up and re-try, which is the right behavior but worth knowing for performance.

What to do next

For the regex features that pair most naturally with lookarounds:

  • Regex Anchors: ^, $, \A, \z, the line-and-string boundary assertions you'll use alongside lookarounds for full-input validation.
  • Regex Word Boundaries: \b and \B, the other zero-width assertion family. Lookarounds are the variable-width alternative when \b isn't precise enough.
  • Regex Capturing Groups and Backreferences: parentheses you use to capture and reference parts of a match, which combine with lookarounds to write surprisingly capable patterns.
  • Validate Password Strength with Regex: the production application of the lookahead-stacking pattern shown above.

For specific real-world patterns that lean heavily on lookarounds:

For the wider regex syntax reference, see the Regex Cheat Sheet.

External reference: the MDN regex assertions page covers JavaScript's implementation; test interactively at regex101.com (set the flavor to PCRE2 or JavaScript depending on your target).

FAQ

TagsRegexLookaheadLookbehindAssertionsZero-Width AssertionsRegular ExpressionsJavaScriptPythonPHPPCRE
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years across software, Linux systems, DevOps, and infrastructure — and a more recent focus on AI. Currently Chief Technology Officer at a tech startup in the healthcare space.

Keep reading

Related posts

Complete reference for regex word boundaries: \b and \B zero-width assertions, engine-by-engine support (JS, Python, Java, PCRE, POSIX), Unicode handling, and lookaround alternatives. Worked examples for whole-word replace and search highlighting.

Regex Word Boundaries: \b, \B, and Lookaround Equivalents

Regex word boundaries (\b and \B) match positions between word and non-word characters with zero width. The full reference with engine differences, Unicode handling, lookaround alternatives, and worked examples for whole-word replace, search highlighting, and log parsing.