Regex anchors are zero-width tokens that assert a position in the input string without consuming a character. They are the pieces that let ^cat match "cat" only at the start of a string, or ^\d{5}$ accept "12345" but reject "12345 abc". Knowing exactly which anchor your regex engine supports, and how it behaves under multiline mode, is the difference between a pattern that ships and one that quietly accepts garbage in production.
This post covers all six anchors I reach for in practice (^, $, \A, \Z, \z, \G), the gotchas that differ across regex flavors, and the patterns I use them in.
Anchor quick reference
| Anchor | What it matches | Affected by multiline mode? |
|---|---|---|
^ | Start of string (or start of each line in multiline mode) | Yes |
$ | End of string (or end of each line in multiline mode); some flavors also match before a final newline | Yes |
\A | Absolute start of string. Never matches at a line break | No |
\Z | End of string, or just before a final newline (Perl, PCRE, Ruby, .NET, Java semantics) | No |
\z | Absolute end of string. Never matches before a trailing newline | No |
\G | Position where the previous match ended (or start of string for the first match). Used for tokenizing | No |
All six are zero-width: they match a position, not a character. ^cat is two pattern elements but only "cat" appears in the matched text.
Why anchors matter in real patterns
The classic bug they fix: validation that looks like it's working but is actually substring-matching. The pattern \d+ accepts "abc123def" because \d+ matched the 123 somewhere in the middle of the string, not because the whole string was digits. This bug shows up in every language that has regex: JavaScript's /\d+/.test(input), Python's re.search(r"\d+", input), even a careless grep -E '\d+'. Anchor both ends and it disappears:
^\d+$
Same idea for ZIP codes (^\d{5}$), hex colors (^#[0-9a-fA-F]{6}$), and any other fixed-format field. Without anchors, a regex says "this thing appears somewhere in your string." With anchors, it says "this thing is the whole string." For input validation that's almost always what you actually want.
^ and $: start and end of the string
^ matches at the position before the first character of the string. $ matches at the position after the last character.
^hello
Matches "hello" only when "hello" sits at the start.
world$
Matches "world" only at the end.
^hello world$
Matches only the exact string "hello world" with nothing before or after.
Multiline mode changes both
When multiline mode is on (/m flag in JavaScript, Perl, PCRE; re.MULTILINE in Python; RegexOptions.Multiline in .NET), ^ and $ match at every line break inside the string, not just the global start and end.
In JavaScript:
/^hello/m
Against the string "hi\nhello", this matches the "hello" on the second line. Without the m flag, it would not.
In JavaScript specifically, you also need the g (global) flag if you want all matches across all lines, not just the first.
\A and \z: absolute string boundaries
\A and \z always anchor to the absolute start and end of the string, ignoring multiline mode entirely. This matters when you're validating a multi-line block of input but only want the pattern to match against the whole block, not any internal line.
\Astart.*end\z
This matches only if the entire input starts with "start" and ends with "end", even if .* would have to cross line breaks (with the s/dotall flag) or even if multiline mode is on.
If you're hand-rolling a parser for a config file or a multi-line user submission, prefer \A and \z over ^ and $ when "start" and "end" mean the start and end of the entire input.
JavaScript and Go do not support \A or \z. There, you have to be careful to keep multiline mode off when you mean "whole string".
\Z: end of string, or just before a final newline
\Z is the messy one because different flavors define it differently.
In Perl, PCRE (PHP, R), .NET, Java, and Ruby, \Z matches either at the end of the string, OR at the position just before a single trailing newline. So end\Z matches both "end" and "end\n".
In Python, \Z is more like other flavors' \z: it matches only at the absolute end of the string. Python does not have \z at all.
When you read a file or a user-submitted form value, a trailing newline is common and harmless. If you write something\z you reject the trailing-newline case; if you write something\Z (in most flavors) you accept it. For most "is this string exactly X" validation against possibly-newline-terminated input, \Z is the friendlier choice, except in Python, where there's no \z so \Z does the strict thing and you have to chomp the newline yourself.
\G: where the last match ended
\G is the tokenizer anchor. It matches at the position where the previous match ended, or at the very start of the string for the first match.
\G\w+
Run against "one two three" with global matching, this returns "one" and then fails on the space. That's the whole point: \G lets you walk a string contiguously and stop as soon as the structure breaks, which is how you write strict token-by-token parsers without falling back to a hand-rolled loop.
Support: Perl, PCRE (PHP), .NET, Java, Ruby support \G. JavaScript, Python's stdlib re, Go, and Rust do not. If you need \G in Python, the third-party regex package is a drop-in replacement for re that supports \G along with branch reset groups, atomic groups, and full Unicode properties.
Common gotchas across regex engines
Anchors look interchangeable until you cross a language boundary. The traps I've actually hit in production:
Multiline mode defaults differ. Ruby's ^ and $ always match at line boundaries; there's no separate multiline mode to enable. In Perl, PCRE, JavaScript, Python, and .NET, you have to opt in with a flag. POSIX behaves more like Ruby. If you copy a Ruby regex into Python without thinking, your ^ will silently become "start of whole string only" and your matches disappear.
\Z vs \z. Same character, opposite letter case, completely different behavior in some flavors. \Z may allow a trailing newline; \z never does. Python has only \Z and it behaves like \z. Mix these up and your validator will quietly accept or reject newline-terminated strings.
CRLF on Windows. Some engines treat \r\n as a single line break (Delphi, Java, Boost). Others treat it as two (JavaScript, XPath). .NET happily matches $ between the \r and the \n. If you're parsing CRLF-formatted input and getting weird off-by-one matches, this is almost always why.
JavaScript has no \A or \z. People reach for these and they just match the literal letters A and z. There is no error, just a wrong pattern. Use ^ and $ and keep multiline mode off when you mean "whole string", or pre-trim newlines.
Go's regexp is RE2, not PCRE. It supports \A and \z but does not support \Z, \G, lookarounds, or backreferences. If you're porting a Perl or Ruby regex to Go and it stops working, those last four are usually the reason. The same constraints apply to Rust's regex crate.
RE2 inverts multiline assumptions. In Go's regexp, (?m) flag flips ^ and $ to multi-line behavior, the same opt-in as Perl. But the underlying engine guarantees linear time, which means some PCRE patterns that depended on backtracking simply won't compile.
Anchors in lookarounds
You can compose anchors with lookaheads and lookbehinds to build conditional position assertions.
Negative lookbehind with ^. Match "word" only if it is NOT at the start of a string:
(?<!^)word
Positive lookahead with $. Match "word" only if it is at the end of a string:
word(?=$)
Compound boundary. Match "word" only if it sits at the start of a line that itself follows a newline (i.e., not the first line of the input):
(?<=\n)^word
For the conditional-position primitives in detail, see my regex lookaheads and lookbehinds guide. Anchors compose with them naturally because both are zero-width.
Practical patterns
Things I've actually written or shipped, organized by what they validate.
Match the entire string
^hello world$
Matches only if the string is exactly "hello world", nothing before, nothing after.
Validate a US phone number
^\(\d{3}\) \d{3}-\d{4}$
Accepts (123) 456-7890. Rejects xx(123) 456-7890yy. More variants and the "actually use a library for this" caveat are in my regex for US phone numbers.
Validate a ZIP code
^\d{5}(-\d{4})?$
Five-digit ZIP, optionally followed by the four-digit ZIP+4 extension.
Validate a hex color
^#[0-9a-fA-F]{6}$
Six hex digits prefixed with #. Full ruleset (including 3-digit shorthand, alpha channel, HSL alternatives) in my regex for hex color codes.
Validate a simple password
^(?=.*[A-Z])(?=.*\d)[A-Za-z\d]{8,}$
At least one uppercase, at least one digit, 8 or more characters from the alphanumeric set. Real-world password validation gets thornier; see the full password-strength regex breakdown.
Tokenize a string contiguously
\G\w+
Walks a string word by word. Stops at the first non-word character. Useful for strict parsers in engines that support \G.
Match lines starting with a digit
^\d+.*$
With multiline mode on, every line whose first character is a digit.
Detect trailing whitespace
\s+$
Whitespace at the end of a line. Pair with multiline mode to lint a whole file in one pass.
Find empty lines
^\s*$
Lines with nothing but whitespace, or no characters at all. Multiline mode required.
Match a file extension
^.*\.txt$
Filenames ending in .txt. The .* is greedy by default, which is fine here because $ pins the end.
Extract a domain from a URL
^https?://([^/]+)/
Captures the host portion (group 1). For the more rigorous version with all the URL gotchas, see my regex for matching URLs.
Validate a hex literal
^0x[0-9a-fA-F]+$
Numbers like 0xff, 0xABCD. No upper bound on length, which you usually want for hex.
Find consecutive duplicate words
\b(\w+) \1\b
Uses backreferences to match a word that's immediately repeated. Not an anchor per se but a frequent companion of ^ and $ in text-linting patterns.
Anchor support by language
| Engine | ^ $ | \A | \Z | \z | \G | Multiline default |
|---|---|---|---|---|---|---|
| JavaScript | ✅ | ❌ | ❌ | ❌ | ❌ | Off; opt in with m flag |
Python (re) | ✅ | ✅ | ✅ (strict, like other flavors' \z) | ❌ | ❌ | Off; opt in with re.MULTILINE |
Python (regex pkg) | ✅ | ✅ | ✅ | ✅ | ✅ | Off; opt in with regex.MULTILINE |
| Ruby | ✅ (always multiline) | ✅ | ✅ | ✅ | ✅ | On by default |
| PCRE / PCRE2 (PHP 7.3+, R, Swift) | ✅ | ✅ | ✅ | ✅ | ✅ | Off; opt in with (?m) |
| .NET | ✅ | ✅ | ✅ | ✅ | ✅ | Off; opt in with RegexOptions.Multiline |
| Java | ✅ | ✅ | ✅ | ✅ | ✅ | Off; opt in with Pattern.MULTILINE |
| Go (RE2) | ✅ | ✅ | ❌ | ✅ | ❌ | Off; opt in with (?m) |
Rust (regex crate) | ✅ | ✅ | ❌ | ✅ | ❌ | Off; opt in with (?m) |
| POSIX (BRE/ERE) | ✅ | ❌ | ❌ | ❌ | ❌ | Per-line by default in grep/sed/awk |
The takeaway: if your regex must run cross-language (e.g., the same validation pattern on the frontend in JavaScript and on the backend in Go or Python), restrict yourself to ^ and $ and pre-trim newlines. Anything fancier risks silent disagreement between the two implementations.
Common mistakes
The bugs I see most often, and the fix for each.
Forgetting the second anchor. ^\d+ matches a string that starts with digits, but says nothing about what comes after. ^\d+$ matches a string that is entirely digits. For validation, almost always use both.
Confusing ^ with "negation" in character classes. ^ inside [...] (like [^abc]) means "not these characters." ^ outside a character class means "start of string." Same symbol, different jobs.
Using ^ to match the start of a substring after extracting. If you've already split the input on commas and you're matching the second field, you don't need ^ because the substring is now the whole input. Adding ^ doesn't hurt, but the bug is usually in the other direction: someone forgets the substring is now a whole string and writes (?<=,)pattern instead.
Assuming $ includes the newline. In most flavors, $ matches just before a trailing newline, but the newline itself is not in the match. If you're trimming output and want the newline gone too, use \s*$ or strip the newline before matching.
Multiline mode bleed. Turning on m in JavaScript also affects every ^ and $ in the pattern, not just the one you wanted. If you need "whole string" and "line boundary" in the same pattern, use \A and \z for the whole-string ones and ^/$ for the line ones, in any flavor that supports them.
FAQ
See also
- Regex Word Boundaries: the other zero-width assertion family
- Regex Lookaheads and Lookbehinds: composable position assertions
- Regex Capturing Groups and Backreferences: the partner concept for tokenizing
- Regex Cheat Sheet: anchors, character classes, quantifiers, groups, and flags in one searchable reference





