TechEarl

How to Match a Whole Word with grep -w

grep cat also matches category, concatenate, and scatter. grep -w cat matches only the standalone word. The whole-word flag, what grep counts as a word boundary, the regex equivalents with \b and \< \>, the stricter -x whole-line cousin, and the BSD vs GNU differences that bite on macOS.

Ishan KarunaratneIshan Karunaratne⏱️ 12 min readUpdated
Use grep -w to match a whole word instead of a substring. What grep counts as a word boundary, the \b and \< \> regex equivalents, -x for whole-line match, and BSD vs GNU differences.

grep cat file matches more than you want. It hits category, concatenate, scatter, wildcat: anything with the three letters cat somewhere inside it. By default grep matches substrings, and a short search word turns into a wall of false positives.

grep -w 'cat' file fixes it. The -w flag tells grep to match only when the pattern stands alone as a whole word, with a word boundary on each side. cat matches; category does not. This is the flag I reach for whenever the search term is a real word that also lives inside longer words, and it is the difference between a clean result set and a manual scroll-and-skim.

Set your values

Try it with your own values

Set your OS, the search path, and the word you want to match. Every grep example below updates with your values.

The one-liner

bash· Linux (GNU)
grep -w ':pattern' :search_path

That returns every line where :pattern appears as a standalone word. With the default value id, grep -w 'id' matches the literal word id but skips width, valid, idea, android, and kid. Without -w, a search for id in any real codebase is close to useless.

PowerShell has no direct -w equivalent, so the Windows variant uses explicit word-boundary anchors (\b) in the .NET regex.

What counts as a word boundary

-w is not magic. It is shorthand for wrapping your pattern in word boundaries, and a word boundary is defined by what grep considers a word character:

  • Letters: a-z, A-Z
  • Digits: 0-9
  • Underscore: _

Everything else (spaces, punctuation, slashes, dots, the start and end of the line) is a non-word character, and the transition between a word character and a non-word character is a boundary.

So -w requires that the character immediately before the match is a non-word character or the line start, and the character immediately after is a non-word character or the line end. That is the whole rule.

The underscore inclusion is the part people forget. grep -w 'user' will not match user_id or _user, because _ is a word character, so there is no boundary between user and _. If you are searching code where identifiers use snake_case, -w on a single segment of the name silently misses every compound identifier. More on that in the mistakes section.

The manual equivalent with regex word boundaries

-w is a convenience flag. You can write the same thing by hand with regex word-boundary metacharacters, which is useful when you need a boundary on only one side, or when you are composing a larger pattern.

GNU grep supports the Perl-style \b boundary in ERE mode:

bash· Linux (GNU)
grep -E '\b:pattern\b' :search_path

The classic BRE (basic regex) form uses \< for "start of word" and \> for "end of word":

bash· Linux (GNU)
grep '\<:pattern\>' :search_path

The \< and \> anchors are zero-width: they match a position, not a character, just like ^ and $. \< is supported on both GNU and BSD grep, which makes it the most portable choice when you need an explicit boundary. \b matches either side of a boundary, so it is more flexible, but BSD grep does not honor it.

For the full tour of which metacharacters belong to which regex mode, see the BRE vs ERE vs PCRE guide.

-w with a multi-word or regex pattern

When the pattern is more than a single literal word, -w applies the boundary to the whole pattern, not to each piece inside it. This is the rule that surprises people most.

grep -w 'hot dog' file requires a boundary before hot and after dog. It matches a hot dog please but not hot dogs (the s makes dog not end on a boundary). The space between hot and dog is internal to the pattern; -w does not touch it.

With an alternation, the boundary still wraps the entire expression:

bash· Linux (GNU)
grep -wE '(cat|dog|fish)' pets.txt

GNU's documentation describes -w precisely: the matched substring must either be at the start of the line or preceded by a non-word character, and must either be at the end of the line or followed by a non-word character. For an alternation, "the matched substring" is whichever branch matched, so each of cat, dog, fish is independently required to sit on boundaries. That behavior is what you usually want, but it is worth knowing it is the match, not the pattern source, that gets the boundary.

-w combined with -i, -r, -v, -c

-w composes cleanly with the rest of the grep flag set.

Case-insensitive whole-word match:

bash· Linux (GNU)
grep -wi ':pattern' :search_path

Recursive whole-word search through a directory tree:

bash· Linux (GNU)
grep -rwn ':pattern' :search_path

Invert the match: lines that do not contain the word as a standalone token:

bash· Linux (GNU)
grep -wv ':pattern' :search_path

One subtlety with -wv: a line containing width (but never the bare word id) counts as a non-match for grep -w 'id', so -wv keeps it. That is correct, but if you mentally model -w as "lines mentioning id", the inverted result can look wrong. It is not; -v inverts the whole-word test, not a substring test.

Count whole-word matches:

bash· Linux (GNU)
grep -wc ':pattern' :search_path

-c counts matching lines, not matches. A line with the word twice counts once. For a true occurrence count use grep -ow ':pattern' file | wc -l.

-x for whole-LINE match (the stricter cousin)

-w has a stricter sibling: -x matches only when the pattern equals the entire line, not just a whole word inside it.

FlagMatchescat matches the line...
(none)substring anywhereconcatenate, the cat sat
-wwhole word, boundaries on both sidesthe cat sat, not concatenate
-xthe whole line, nothing else on itcat, not the cat sat

grep -x 'cat' file matches a line that is exactly cat and nothing more: no leading spaces, no trailing text. It is the right tool when you are checking config files or allow-lists where a line must be an exact value.

bash· Linux (GNU)
grep -x ':pattern' :search_path

A common pairing is grep -Fx to check membership: grep -Fxq 'value' allowlist.txt exits 0 if value is a whole line in the file, treating the pattern as a fixed string. That is the canonical "is this entry in the list" test in shell scripts.

Think of it as a scale: no flag matches anywhere on the line, -w matches a token on the line, -x matches the whole line. Pick the tightest one that still catches what you need.

macOS BSD grep vs GNU grep

-w and -x themselves are portable: both work identically on GNU grep (Linux) and BSD grep (the macOS default). The divergence is entirely in the regex boundary metacharacters.

FeatureGNU grepBSD grep (macOS default)
-w (whole word)SupportedSupported
-x (whole line)SupportedSupported
\< \> (BRE word anchors)SupportedSupported
\b (Perl-style boundary)Supported in BRE and ERETreated as a literal backspace
\w \W (word char classes)SupportedNot supported (use [[:alnum:]_])

The practical takeaway: if a script has to run on both Linux and macOS, use -w for whole-word matching, or use \< \> when you need an explicit one-sided boundary. Avoid \b and \w in portable scripts. On macOS, brew install grep gives you GNU grep as ggrep if you genuinely need \b. The grep cheat sheet has the full BSD vs GNU divergence table.

Common grep -w mistakes

1. Assuming -w adds boundaries inside the pattern. -w wraps the whole pattern (or whichever alternation branch matched), not each word in it. grep -w 'foo|bar' without -E matches the literal seven-character string foo|bar as a whole word, which is almost never the intent. Use grep -wE '(foo|bar)' so the alternation is parsed and each branch gets the boundary.

2. Forgetting underscore is a word character. grep -w 'user' does not match user_id, user_name, or _user, because _ is a word character and there is no boundary between user and _. In snake_case codebases this silently drops every compound identifier. If you want user and user_*, drop -w and use an explicit pattern like grep -E '\buser\b|\buser_' (GNU), or just search the substring and accept the noise.

3. A pattern that starts or ends with a non-word character. grep -w '.config' is contradictory: -w wants a word boundary before the match, but the first character . is itself a non-word character, so there is no word for the boundary to attach to. Patterns whose edges are punctuation usually match nothing under -w. Drop -w and anchor manually, or rethink the pattern.

4. Using -w when the word is glued to punctuation you care about. Searching for error with -w matches error, and error: (the comma and colon are boundaries) but a search for error-code as a hyphenated unit needs the hyphen inside the pattern, since - is a boundary character that -w will treat as a word edge.

5. Confusing -w with -x. -w matches a word on the line; -x matches the whole line. If grep -w 'cat' returns lines with extra text and you wanted exact lines only, you wanted -x.

6. Expecting -w to do language-aware tokenization. -w only knows the [A-Za-z0-9_] rule. It has no concept of word stems, contractions, or non-ASCII word characters in some locales. don't is two grep-words (don and t) split by the apostrophe.

When NOT to use this

-w is the wrong tool when:

  • You genuinely want substring matches. Searching for a partial identifier, a filename fragment, or a prefix is a legitimate use of plain grep. Forcing -w there just makes you miss results. grep 'config' to find configure, config.json, and reconfigured is correct as-is.
  • You need language-aware tokenization. Splitting prose into real words (handling contractions, hyphenated compounds, Unicode letters, stemming) is a job for a text-processing library, not grep's [A-Za-z0-9_] boundary rule. If don't or co-operate must count as single words, grep cannot do it.
  • The "word" contains boundary characters. A version string like 1.2.3, a path like /etc/hosts, or an email address are not single grep-words; the dots, slashes, and @ are all boundaries. Match these with -F (fixed string) or an anchored regex instead.
  • You are matching a whole line. Use -x. -w still allows other text on the line.

See also

FAQ

TagsgrepCLIRegexLinuxmacOSBSDShell Scripting
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years across software, Linux systems, DevOps, and infrastructure — and a more recent focus on AI. Currently Chief Technology Officer at a tech startup in the healthcare space.

Keep reading

Related posts

Match a hex color code with regex. 3-digit, 6-digit, and 8-digit (alpha) forms. Case-insensitive. JavaScript / Python / PHP examples, engine notes, common mistakes, test cases.

How to Match a Hex Color Code with Regex

Match a hex color code with regex. 3-digit, 6-digit, and 8-digit (alpha) forms. JavaScript / Python / PHP examples, engine notes, common mistakes, a stripped-hash variant.

Match a URL with regex. http/https schemes, protocol-relative URLs, ports, paths, query strings, fragments. JavaScript / Python / PHP examples, engine notes, parser alternative, common mistakes, test table.

How to Match a URL with Regex

Match a URL with regex. Covers http/https schemes, protocol-relative URLs, ports, paths, query strings, fragments, runnable JavaScript / Python / PHP, engine notes, and the URL parser alternative.

Match a domain name with regex. Basic labels, RFC 1035 length rules, subdomains, IDN punycode, trailing-dot form, JavaScript / Python / PHP examples, engine notes, and common mistakes.

How to Match a Domain Name with Regex

Match a domain name with regex. Basic labels, RFC 1035 length rules, subdomains, IDN punycode, trailing-dot form, JavaScript / Python / PHP examples, engine notes, and common mistakes.