TechEarl

Regex Anchors

Regex anchors are unique tokens that assert positions within a string without matching characters. Discover their role in pattern matching across languages.

Ishan KarunaratneIshan Karunaratne⏱️ 15 min readUpdated
Regex anchors explained for production use: how ^, $, \A, \Z, \z, and \G assert positions without matching characters, with examples, multiline-mode gotchas, and language support across JavaScript, Python, Ruby, PCRE, .NET, Go, Java, Rust, and POSIX.

Regex anchors are zero-width tokens that assert a position in the input string without consuming a character. They are the pieces that let ^cat match "cat" only at the start of a string, or ^\d{5}$ accept "12345" but reject "12345 abc". Knowing exactly which anchor your regex engine supports, and how it behaves under multiline mode, is the difference between a pattern that ships and one that quietly accepts garbage in production.

This post covers all six anchors I reach for in practice (^, $, \A, \Z, \z, \G), the gotchas that differ across regex flavors, and the patterns I use them in.

Anchor quick reference

AnchorWhat it matchesAffected by multiline mode?
^Start of string (or start of each line in multiline mode)Yes
$End of string (or end of each line in multiline mode); some flavors also match before a final newlineYes
\AAbsolute start of string. Never matches at a line breakNo
\ZEnd of string, or just before a final newline (Perl, PCRE, Ruby, .NET, Java semantics)No
\zAbsolute end of string. Never matches before a trailing newlineNo
\GPosition where the previous match ended (or start of string for the first match). Used for tokenizingNo

All six are zero-width: they match a position, not a character. ^cat is two pattern elements but only "cat" appears in the matched text.

Why anchors matter in real patterns

The classic bug they fix: validation that looks like it's working but is actually substring-matching. The pattern \d+ accepts "abc123def" because \d+ matched the 123 somewhere in the middle of the string, not because the whole string was digits. This bug shows up in every language that has regex: JavaScript's /\d+/.test(input), Python's re.search(r"\d+", input), even a careless grep -E '\d+'. Anchor both ends and it disappears:

code
^\d+$

Same idea for ZIP codes (^\d{5}$), hex colors (^#[0-9a-fA-F]{6}$), and any other fixed-format field. Without anchors, a regex says "this thing appears somewhere in your string." With anchors, it says "this thing is the whole string." For input validation that's almost always what you actually want.

^ and $: start and end of the string

^ matches at the position before the first character of the string. $ matches at the position after the last character.

code
^hello

Matches "hello" only when "hello" sits at the start.

code
world$

Matches "world" only at the end.

code
^hello world$

Matches only the exact string "hello world" with nothing before or after.

Multiline mode changes both

When multiline mode is on (/m flag in JavaScript, Perl, PCRE; re.MULTILINE in Python; RegexOptions.Multiline in .NET), ^ and $ match at every line break inside the string, not just the global start and end.

In JavaScript:

code
/^hello/m

Against the string "hi\nhello", this matches the "hello" on the second line. Without the m flag, it would not.

In JavaScript specifically, you also need the g (global) flag if you want all matches across all lines, not just the first.

\A and \z: absolute string boundaries

\A and \z always anchor to the absolute start and end of the string, ignoring multiline mode entirely. This matters when you're validating a multi-line block of input but only want the pattern to match against the whole block, not any internal line.

code
\Astart.*end\z

This matches only if the entire input starts with "start" and ends with "end", even if .* would have to cross line breaks (with the s/dotall flag) or even if multiline mode is on.

If you're hand-rolling a parser for a config file or a multi-line user submission, prefer \A and \z over ^ and $ when "start" and "end" mean the start and end of the entire input.

JavaScript and Go do not support \A or \z. There, you have to be careful to keep multiline mode off when you mean "whole string".

\Z: end of string, or just before a final newline

\Z is the messy one because different flavors define it differently.

In Perl, PCRE (PHP, R), .NET, Java, and Ruby, \Z matches either at the end of the string, OR at the position just before a single trailing newline. So end\Z matches both "end" and "end\n".

In Python, \Z is more like other flavors' \z: it matches only at the absolute end of the string. Python does not have \z at all.

When you read a file or a user-submitted form value, a trailing newline is common and harmless. If you write something\z you reject the trailing-newline case; if you write something\Z (in most flavors) you accept it. For most "is this string exactly X" validation against possibly-newline-terminated input, \Z is the friendlier choice, except in Python, where there's no \z so \Z does the strict thing and you have to chomp the newline yourself.

\G: where the last match ended

\G is the tokenizer anchor. It matches at the position where the previous match ended, or at the very start of the string for the first match.

code
\G\w+

Run against "one two three" with global matching, this returns "one" and then fails on the space. That's the whole point: \G lets you walk a string contiguously and stop as soon as the structure breaks, which is how you write strict token-by-token parsers without falling back to a hand-rolled loop.

Support: Perl, PCRE (PHP), .NET, Java, Ruby support \G. JavaScript, Python's stdlib re, Go, and Rust do not. If you need \G in Python, the third-party regex package is a drop-in replacement for re that supports \G along with branch reset groups, atomic groups, and full Unicode properties.

Common gotchas across regex engines

Anchors look interchangeable until you cross a language boundary. The traps I've actually hit in production:

Multiline mode defaults differ. Ruby's ^ and $ always match at line boundaries; there's no separate multiline mode to enable. In Perl, PCRE, JavaScript, Python, and .NET, you have to opt in with a flag. POSIX behaves more like Ruby. If you copy a Ruby regex into Python without thinking, your ^ will silently become "start of whole string only" and your matches disappear.

\Z vs \z. Same character, opposite letter case, completely different behavior in some flavors. \Z may allow a trailing newline; \z never does. Python has only \Z and it behaves like \z. Mix these up and your validator will quietly accept or reject newline-terminated strings.

CRLF on Windows. Some engines treat \r\n as a single line break (Delphi, Java, Boost). Others treat it as two (JavaScript, XPath). .NET happily matches $ between the \r and the \n. If you're parsing CRLF-formatted input and getting weird off-by-one matches, this is almost always why.

JavaScript has no \A or \z. People reach for these and they just match the literal letters A and z. There is no error, just a wrong pattern. Use ^ and $ and keep multiline mode off when you mean "whole string", or pre-trim newlines.

Go's regexp is RE2, not PCRE. It supports \A and \z but does not support \Z, \G, lookarounds, or backreferences. If you're porting a Perl or Ruby regex to Go and it stops working, those last four are usually the reason. The same constraints apply to Rust's regex crate.

RE2 inverts multiline assumptions. In Go's regexp, (?m) flag flips ^ and $ to multi-line behavior, the same opt-in as Perl. But the underlying engine guarantees linear time, which means some PCRE patterns that depended on backtracking simply won't compile.

Anchors in lookarounds

You can compose anchors with lookaheads and lookbehinds to build conditional position assertions.

Negative lookbehind with ^. Match "word" only if it is NOT at the start of a string:

code
(?<!^)word

Positive lookahead with $. Match "word" only if it is at the end of a string:

code
word(?=$)

Compound boundary. Match "word" only if it sits at the start of a line that itself follows a newline (i.e., not the first line of the input):

code
(?<=\n)^word

For the conditional-position primitives in detail, see my regex lookaheads and lookbehinds guide. Anchors compose with them naturally because both are zero-width.

Practical patterns

Things I've actually written or shipped, organized by what they validate.

Match the entire string

code
^hello world$

Matches only if the string is exactly "hello world", nothing before, nothing after.

Validate a US phone number

code
^\(\d{3}\) \d{3}-\d{4}$

Accepts (123) 456-7890. Rejects xx(123) 456-7890yy. More variants and the "actually use a library for this" caveat are in my regex for US phone numbers.

Validate a ZIP code

code
^\d{5}(-\d{4})?$

Five-digit ZIP, optionally followed by the four-digit ZIP+4 extension.

Validate a hex color

code
^#[0-9a-fA-F]{6}$

Six hex digits prefixed with #. Full ruleset (including 3-digit shorthand, alpha channel, HSL alternatives) in my regex for hex color codes.

Validate a simple password

code
^(?=.*[A-Z])(?=.*\d)[A-Za-z\d]{8,}$

At least one uppercase, at least one digit, 8 or more characters from the alphanumeric set. Real-world password validation gets thornier; see the full password-strength regex breakdown.

Tokenize a string contiguously

code
\G\w+

Walks a string word by word. Stops at the first non-word character. Useful for strict parsers in engines that support \G.

Match lines starting with a digit

code
^\d+.*$

With multiline mode on, every line whose first character is a digit.

Detect trailing whitespace

code
\s+$

Whitespace at the end of a line. Pair with multiline mode to lint a whole file in one pass.

Find empty lines

code
^\s*$

Lines with nothing but whitespace, or no characters at all. Multiline mode required.

Match a file extension

code
^.*\.txt$

Filenames ending in .txt. The .* is greedy by default, which is fine here because $ pins the end.

Extract a domain from a URL

code
^https?://([^/]+)/

Captures the host portion (group 1). For the more rigorous version with all the URL gotchas, see my regex for matching URLs.

Validate a hex literal

code
^0x[0-9a-fA-F]+$

Numbers like 0xff, 0xABCD. No upper bound on length, which you usually want for hex.

Find consecutive duplicate words

code
\b(\w+) \1\b

Uses backreferences to match a word that's immediately repeated. Not an anchor per se but a frequent companion of ^ and $ in text-linting patterns.

Anchor support by language

Engine^ $\A\Z\z\GMultiline default
JavaScriptOff; opt in with m flag
Python (re)✅ (strict, like other flavors' \z)Off; opt in with re.MULTILINE
Python (regex pkg)Off; opt in with regex.MULTILINE
Ruby✅ (always multiline)On by default
PCRE / PCRE2 (PHP 7.3+, R, Swift)Off; opt in with (?m)
.NETOff; opt in with RegexOptions.Multiline
JavaOff; opt in with Pattern.MULTILINE
Go (RE2)Off; opt in with (?m)
Rust (regex crate)Off; opt in with (?m)
POSIX (BRE/ERE)Per-line by default in grep/sed/awk

The takeaway: if your regex must run cross-language (e.g., the same validation pattern on the frontend in JavaScript and on the backend in Go or Python), restrict yourself to ^ and $ and pre-trim newlines. Anything fancier risks silent disagreement between the two implementations.

Common mistakes

The bugs I see most often, and the fix for each.

Forgetting the second anchor. ^\d+ matches a string that starts with digits, but says nothing about what comes after. ^\d+$ matches a string that is entirely digits. For validation, almost always use both.

Confusing ^ with "negation" in character classes. ^ inside [...] (like [^abc]) means "not these characters." ^ outside a character class means "start of string." Same symbol, different jobs.

Using ^ to match the start of a substring after extracting. If you've already split the input on commas and you're matching the second field, you don't need ^ because the substring is now the whole input. Adding ^ doesn't hurt, but the bug is usually in the other direction: someone forgets the substring is now a whole string and writes (?<=,)pattern instead.

Assuming $ includes the newline. In most flavors, $ matches just before a trailing newline, but the newline itself is not in the match. If you're trimming output and want the newline gone too, use \s*$ or strip the newline before matching.

Multiline mode bleed. Turning on m in JavaScript also affects every ^ and $ in the pattern, not just the one you wanted. If you need "whole string" and "line boundary" in the same pattern, use \A and \z for the whole-string ones and ^/$ for the line ones, in any flavor that supports them.

FAQ

See also

TagsRegular ExpressionsRegexRegex AnchorsPattern Matching
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

Regex Cheat Sheet including regex symbols, ranges, grouping, assertions, syntax tables, examples, matches, and compatibility tables. Definitive Regular Expressions Quick Reference!

Regex Cheat Sheet

Regex Cheat Sheet including regex symbols, ranges, grouping, assertions, syntax tables, examples, matches, and compatibility tables. Definitive Regular Expressions Quick Reference!

The PCRE (*ACCEPT) backtracking control verb: how it forces an immediate successful regex match, how capturing groups are closed when it fires, which engines support it, and the backtracking control verb family.

The Regex (*ACCEPT) Control Verb, Explained

What the PCRE (*ACCEPT) backtracking control verb does, how it forces an immediate successful match, how it behaves inside capturing groups, which engines support it, and where it is genuinely useful.

Using regex in Apache .htaccess with mod_rewrite: RewriteRule and RewriteCond pattern syntax, rewrite flags, and copy-paste rules for HTTPS redirects, www normalization, trailing slashes, 301 redirects, clean URLs, and blocking by user-agent or IP.

How to Use Regex in .htaccess (Apache mod_rewrite)

Use regex in .htaccess with Apache mod_rewrite: how RewriteRule and RewriteCond patterns work, the per-directory quirk that breaks everyone, and copy-paste rules for HTTPS, www, trailing slashes, 301s, clean URLs, and access blocking.