TechEarl

grep Regex: BRE vs ERE vs PCRE Explained

grep has three regex engines and the default one surprises everyone: in basic regex (BRE) the characters + ? | ( ) { } are literal text until you backslash-escape them. -E switches to extended regex (ERE) where they work bare, and -P unlocks Perl-compatible regex with lookaround and \d. The full BRE vs ERE vs PCRE comparison, the same pattern in all three, and why -P does not exist on macOS.

Ishan KarunaratneIshan Karunaratne⏱️ 7 min readUpdated
grep -E vs grep -P explained: basic regex (BRE) treats + ? | ( ) { } as literal text, extended regex (ERE) makes them metacharacters, and PCRE adds lookaround and \d. Plus why macOS BSD grep has no -P.

grep does not have one regex engine. It has three, and which one you get depends on the flag. The default is basic regex (BRE). Add -E and you get extended regex (ERE). Add -P and you get Perl-compatible regex (PCRE). The same pattern string can match different text, or fail to compile, depending on which mode is active.

The mode that surprises everyone is the default. In BRE, the characters +, ?, |, (, ), {, and } are literal text. They match themselves. To use them as metacharacters you have to backslash-escape them: \+, \?, \|, \(, \), \{, \}. That inversion (escape the metacharacter to make it special, leave it bare to make it literal) is the single biggest reason people give up on grep and reach for grep -E.

This article is the deep dive on the three modes. For the full flag reference, see the grep cheat sheet.

Set your values

Try it with your own values

Set your OS, search path, and a test pattern. Every grep example below updates with your values.

The three modes at a glance

ModeFlagMetacharacters bareBest for
Basic (BRE)none (default). * ^ $ [...] \{ \} \( \) \+ \? |Simple literal-ish searches; portable scripts
Extended (ERE)-E (or egrep)adds + ? ` ( ) ` bare
Perl-compatible (PCRE)-Padds lookaround, \d \w \s, non-greedy, backreferencesAnything BRE and ERE cannot express

The practical advice: reach for -E by default. Use plain grep only when the pattern is genuinely basic, and use -P only when you need something PCRE-exclusive and you are on GNU grep.

BRE: the default, where metacharacters are literal

In basic regex, this list of characters means themselves, not their regex function:

code
+   matches a literal plus sign
?   matches a literal question mark
|   matches a literal pipe character
(   matches a literal open paren
)   matches a literal close paren
{   matches a literal open brace
}   matches a literal close brace

To get the regex behavior, you escape them. So in BRE, "one or more digits" is written with an escaped plus:

bash
grep '[0-9]\+' app.log

That \+ is "one or more of the preceding". Without the backslash, [0-9]+ would match a digit followed by a literal + character. Grouping and alternation work the same way, escaped:

bash
grep '\(error\|warn\)' app.log

The escaped \( and \) form a group; the escaped \| is alternation. Interval quantifiers also need escaping. To match "between 2 and 4 of the preceding", you write the braces escaped:

bash
grep 'a\{2,4\}' app.log

What does work bare in BRE: . (any character), * (zero or more of the preceding), ^ (start of line), $ (end of line), [...] (character class), and [^...] (negated class). Those five are the BRE toolkit. Everything else is escape-to-activate.

BRE exists because it is the original 1970s grep behavior, frozen by POSIX for backward compatibility. The default stays, and -E is the opt-in to sanity.

ERE: extended regex, the one you actually want

Extended regex flips the rule. In ERE, + ? | ( ) { } are metacharacters directly, no backslash needed. To match them literally you escape them, which is what every other regex flavor does and what your instincts expect.

The same three patterns from above, rewritten for ERE:

bash
grep -E '[0-9]+' app.log
grep -E '(error|warn)' app.log
grep -E 'a{2,4}' app.log

Cleaner, and it matches how regex works in Python, JavaScript, Perl, and every editor's find dialog. This is why ERE is the right default for interactive use.

bash· Linux (GNU)
grep -E ':pattern' :search_path/*.log

egrep is the historical shorthand for grep -E. It still works on most systems but modern GNU and BSD grep print a deprecation warning and tell you to use grep -E. Treat egrep as legacy; write grep -E in anything you commit.

One thing ERE does not add: the Perl shorthand classes. \d, \w, and \s are not part of ERE. More on that below.

PCRE: the full Perl engine

grep -P switches to PCRE, the regex library that backs Perl. This is a genuinely different and far larger engine. It adds everything ERE has plus:

  • Lookahead (?=...) and negative lookahead (?!...)
  • Lookbehind (?<=...) and negative lookbehind (?<!...)
  • Non-greedy quantifiers: *?, +?, ??, {n,m}?
  • Shorthand classes: \d (digit), \w (word char), \s (whitespace), and their negations \D, \W, \S
  • Named groups: (?<name>...)
  • Backreferences by number \1 and by name \k<name>
  • Word boundaries \b that work reliably across the engine

Lookbehind is the headline feature. To extract the value after user= without including user= itself in the match, you anchor with a lookbehind:

bash
grep -oP '(?<=user=)\w+' app.log

The (?<=user=) lookbehind asserts "preceded by user=" without consuming those characters, so -o prints just the username. There is no way to write that in BRE or ERE. The closest you get is a capture group plus sed or awk to pull the group out.

Non-greedy matching is the other one ERE cannot do. .* is greedy and grabs as much as possible; .*? stops at the first opportunity:

bash
grep -oP '".*?"' data.json

-P is GNU only. It is a compile-time option in GNU grep, and even on Linux some minimal builds omit it (you get grep: support for the -P option has not been compiled in). It does not exist at all in BSD grep, which is what macOS ships. That platform gap is the next section.

The same match in all three flavors

Here is one task (find lines with one or more digits followed by ms) written three ways:

code
BRE:  grep    '[0-9]\+ms'  app.log
ERE:  grep -E '[0-9]+ms'   app.log
PCRE: grep -P '\d+ms'      app.log

All three match the same lines. BRE escapes the +; ERE uses it bare; PCRE uses it bare and swaps [0-9] for the \d shorthand. ERE is the portable choice that still reads cleanly.

A second example, "a word repeated 2 to 3 times", shows the brace difference:

code
BRE:  grep    '\(foo\)\{2,3\}'  app.log
ERE:  grep -E '(foo){2,3}'      app.log
PCRE: grep -P '(foo){2,3}'      app.log

ERE and PCRE are identical here; only BRE needs the escaping.

Feature comparison

FeatureBREEREPCRE
Anchors ^ $YesYesYes
Any char ., star *YesYesYes
Character class [...]YesYesYes
Grouping\( \)( )( )
Alternation|``
One-or-more, zero-or-one\+ \?+ ?+ ?
Interval quantifier\{n,m\}{n,m}{n,m}
Shorthand \d \w \sNoNoYes
POSIX class [[:digit:]]YesYesYes
Backreference \1YesNo (POSIX), GNU adds itYes
Non-greedy *?NoNoYes
Lookahead, lookbehindNoNoYes
Named groupsNoNoYes

The two rows that catch people: shorthand classes are PCRE-only, and ERE actually drops backreference support that BRE has (GNU re-adds it as an extension, but POSIX ERE has no \1).

\d \w \s are not in BRE or ERE

This is the most common false assumption. \d looks universal because it works in Python, JavaScript, and PCRE. But in BRE and ERE, \d is just an escaped d, which matches a literal d. So grep -E '\d' finds the letter d, not digits.

The portable replacement is a POSIX character class or an explicit range:

Perl shorthandPOSIX class (BRE/ERE)Explicit range
\d[[:digit:]][0-9]
\w[[:alnum:]_][A-Za-z0-9_]
\s[[:space:]](no clean range)
\D[^[:digit:]][^0-9]

So "three digits" in ERE is:

bash
grep -E '[[:digit:]]{3}' app.log

POSIX classes have an advantage over [0-9]: they are locale-aware. In a non-ASCII locale, [[:alpha:]] matches accented letters that [A-Za-z] misses. For pure ASCII data the explicit ranges are fine. If you genuinely want \d and \w, that is your signal to use -P on GNU grep.

macOS BSD grep vs GNU grep

macOS ships BSD grep, not GNU grep. They agree on BRE and ERE. They diverge hard on PCRE.

CapabilityGNU grepBSD grep (macOS default)
BRE (default)YesYes
ERE (-E)YesYes
PCRE (-P)Yes (if compiled in)Not supported at all
\d \w \s in -ELiteral d w sLiteral d w s
POSIX classes [[:digit:]]YesYes
Backreference \1 in EREYes (GNU extension)No

On macOS, grep -P fails immediately with grep: invalid option -- P. There is no PCRE engine behind BSD grep to enable. Three fixes:

  1. Install GNU grep. brew install grep puts it on PATH as ggrep. Run ggrep -P '...', or alias grep='ggrep' in your shell rc.
  2. Use pcregrep. A separate Homebrew package (brew install pcre) that is purpose-built for PCRE and also does multi-line matching with -M.
  3. Use ripgrep. brew install ripgrep, then rg --pcre2 '...'. ripgrep defaults to its own ERE-like engine and switches to PCRE2 on the --pcre2 flag.
bash· Linux (GNU)
grep -oP '(?<=v)[0-9]+' :search_path/*.log

PowerShell's Select-String uses the .NET regex engine, which supports lookaround and \d natively, so the PCRE-style patterns just work on Windows without any extra install.

Common mistakes

1. Using + in BRE and expecting one-or-more. Plain grep '[0-9]+' looks for a digit followed by a literal plus sign, because in BRE the + is literal. You wanted grep '[0-9]\+' or, better, grep -E '[0-9]+'. This is the number-one BRE trap.

2. Expecting \d to work under -E. grep -E '\d{3}' does not match three digits. ERE has no \d; the engine reads it as a literal d. Use grep -E '[0-9]{3}' or grep -P '\d{3}'.

3. Reaching for lookahead without -P. (?=...) and (?<=...) are PCRE constructs. Under plain grep or grep -E they are parsed as a literal group containing a literal ? and =. If you need lookaround, you need -P, full stop.

4. Running grep -P on macOS. BSD grep has no -P and never will. The command fails with invalid option. Install GNU grep, pcregrep, or ripgrep instead of fighting it.

5. Escaping in the wrong direction. In ERE, \( matches a literal paren and ( starts a group. People coming from BRE escape their groups out of habit, then wonder why the grouping vanished. Pick a mode and commit to its rules.

6. Forgetting POSIX intervals need a closing brace. grep -E 'a{2,' with an unterminated {2, is sometimes accepted as literal text and sometimes errors, depending on the build. Always close the interval.

When NOT to use this

Regex is not always the right tool. Skip it when:

  • The pattern is a fixed literal string. If you are searching for 192.168.1.1 or Cmd+Shift+P, use grep -F (fixed strings). It is faster, and it means the . and + in your search term are treated as literal characters with zero escaping. No regex mode needed.
  • You need to actually parse structured data. Regex is a poor JSON, HTML, or CSV parser. For JSON use jq; for columnar text use awk; for real grammar use a proper parser. A regex that "mostly works" on structured input is a bug waiting for the one edge case that breaks it.
  • You need fields, arithmetic, or multi-line logic. That is awk territory. grep finds lines; awk processes them. If your pattern is growing capture groups just to pull out a column, switch tools.
  • You are matching across newlines. grep is line-oriented and no regex mode changes that. Use pcregrep -M, ripgrep --multiline, or preprocess with tr.

See also

FAQ

TagsgrepRegexBREEREPCRECLILinuxmacOSBSD
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years across software, Linux systems, DevOps, and infrastructure — and a more recent focus on AI. Currently Chief Technology Officer at a tech startup in the healthcare space.

Keep reading

Related posts

find -name uses shell globs on the basename; find -regex matches a full regular expression against the whole path. The -regextype flavors, the GNU emacs vs BSD basic default drift, and when each one is the right tool.

find -regex vs -name: When to Use Regex in find

find -name takes a shell glob and matches the basename; find -regex takes a full regular expression and matches the whole path. That whole-path detail is the number one surprise: -regex '.*\.txt' works but -regex '.txt' matches nothing. The flag reference, -regextype flavors, the GNU vs BSD default-flavor drift, and when -name is the better tool.

Exclude a directory in find with -path './node_modules' -prune -o ... -print. Why the trailing -print is mandatory, the multi-directory form, the slower -not -path alternative, and BSD vs GNU notes.

How to Exclude a Directory in find (the -prune Pattern Explained)

find -path './node_modules' -prune -o -type f -print skips a directory subtree instead of walking into it. The pattern looks strange because -prune is an action, not a test, and the trailing -print is mandatory once you write an explicit action. The breakdown, the multi-directory form, the slower -not -path alternative, and when each one is the right call.

grep is universal and searches everything you point it at; ripgrep (rg) is the fast Rust default that skips .gitignore'd and binary files; ag is the older fast-grep now superseded. Compare speed, defaults, and regex engines.

grep vs ripgrep vs ag: Which Search Tool to Use

grep is on every system and searches exactly what you point it at. ripgrep (rg) is the fast Rust-based default for code search: it skips .gitignore'd, hidden, and binary files unless told otherwise. ag (the_silver_searcher) was the older fast-grep, now largely superseded by ripgrep. This breaks down speed, defaults, regex engines, and exactly when to reach for each one.