Match an Email Address with Regex: The Pragmatic Pattern (2026)

The practical regex for matching an email address that handles 99% of real-world inputs: ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$. It accepts standard mailbox names (letters, digits, dots, underscores, percent, plus, hyphen), an @, a domain with at least one dot, and a top-level domain of two or more letters. There is also a stricter pattern that follows RFC 5321 to the letter, but in most production code the practical one is what I reach for. Below I walk both, plus the runnable code in JavaScript, Python, and PHP, the gotchas per regex flavor, and the bugs I've actually shipped.

The reason there is no single "perfect" email regex is that the RFC technically allows things like "quoted strings"@example.com and _underscore_@example.com that almost no real form ever wants. Tightening the regex to reject them is more useful than chasing every edge of the spec.

Quick reference

The practical pattern, ready to paste:

code

^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$

The same pattern with the TLD length capped at 63 characters (RFC 1035):

code

^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,63}$

The lazy "is this email-shaped" pattern, suitable for log scraping but not validation:

code

^\S+@\S+\.\S+$

Allow-list a single provider:

code

^[A-Za-z0-9._%+-]+@(gmail|googlemail)\.com$

The practical pattern

code

^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$

Breaking it down left to right:

^ and $ anchor the pattern to the full string. Without them, an input like not.an.email some@thing.com extra would match anyway.
[A-Za-z0-9._%+-]+ is the local part (everything before the @). One or more characters from the allowed set. + is included because of Gmail's you+tag@gmail.com convention.
@ is the literal at-sign.
[A-Za-z0-9.-]+ is the domain name. One or more characters.
\. is the literal dot before the TLD. Escaping it matters because an unescaped . matches any character.
[A-Za-z]{2,} is the top-level domain: at least two letters. Allows .co, .com, .museum, .london (any modern TLD).

This pattern works with no engine-specific flags. Anchors and character classes are universal.

The strict pattern (RFC 5321)

If you genuinely need to reject every input the RFC forbids, the pattern is much longer:

code

^[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?\.)+[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?$

This allows all the special characters the RFC permits in the local part, requires the domain name to start and end with alphanumeric (no leading or trailing hyphen), and allows multi-level subdomains like email@mail.support.example.com. Use it when the validation has compliance implications. For login forms, use the practical pattern.

Examples in JavaScript, Python, and PHP

JavaScript:

javascript

const emailPattern = /^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$/;
function isValidEmail(input) {
  return emailPattern.test(input);
}
isValidEmail("alice@example.com");        // true
isValidEmail("alice+marketing@gmail.com"); // true
isValidEmail("alice@example");            // false (no TLD)

Python:

python

import re
EMAIL_RE = re.compile(r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$")

def is_valid_email(value: str) -> bool:
    return bool(EMAIL_RE.match(value))

is_valid_email("alice@example.com")        # True
is_valid_email("alice@.example.com")       # False (leading dot in domain)

PHP:

php

function isValidEmail(string $value): bool {
    return (bool) preg_match('/^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$/', $value);
}

isValidEmail("alice@example.com");          // true
isValidEmail("alice@example.co.uk");        // true (multi-part TLD works)

Notice that PHP also has filter_var($value, FILTER_VALIDATE_EMAIL) which uses an internal validator close to RFC 5322. I prefer it for minimum-fuss email validation in PHP and reach for the regex when I need a stricter or more lenient subset.

Engine compatibility

The practical pattern uses only universal regex features (anchors, character classes, quantifiers). It runs unmodified everywhere. The per-engine notes are about the wider toolkit you reach for when the simple pattern is not enough.

Engine	Practical pattern	Per-engine note
JavaScript	Works	`new URL()` does not parse emails. For deeper validation use a library like `email-validator` or `validator.js`.
Python (`re`)	Works	The stdlib `email.utils.parseaddr` is a more permissive RFC-aware parser. Combine with the regex for typo detection.
Python (`regex` pkg)	Works	Supports Unicode letter classes if you want to accept internationalised mailbox names.
PHP (PCRE)	Works	`filter_var($v, FILTER_VALIDATE_EMAIL)` follows RFC 5322 and is the production default.
Java	Works	`javax.mail.internet.InternetAddress.parse(s, true)` is the closest stdlib RFC-compliant parser.
.NET	Works	`System.Net.Mail.MailAddress` parses to RFC 5322 and throws on invalid input.
Go (RE2)	Works	`net/mail.ParseAddress` is the parser equivalent. RE2 does not support lookaheads, so the strict pattern above must be rewritten.
Rust (`regex` crate)	Works	No lookaheads, no backreferences. The strict pattern needs an alternative implementation.
Ruby	Works	`URI::MailTo::EMAIL_REGEXP` (stdlib) is similar to the practical pattern shown here.
POSIX ERE (`grep -E`)	Works	No `\d` shorthand. Use `[0-9]` instead.

For cross-language validation where the same pattern runs on the frontend and the backend, keep to the practical form. The strict version uses non-capturing groups ((?:...)) which most modern engines accept, but a few legacy POSIX tools do not.

Edge cases the practical pattern handles correctly

Input	Match	Why
`alice+marketing@gmail.com`	Yes	`+` allowed in local part (Gmail tagging)
`first.last@example.co.uk`	Yes	Dots in local part, multi-part TLD
`user_name@sub.example.com`	Yes	Subdomain works
`1234@example.io`	Yes	All-numeric local part is legal
`alice@example.museum`	Yes	Long TLDs work (any 2+ letters)

What still gets through (and what to do about it)

The practical pattern accepts a few inputs that look valid but are technically wrong:

Consecutive dots in the local part: alice..bob@example.com matches because [._%+-]+ allows consecutive dots. RFC 5321 forbids this.
Domain hyphen position: alice@-example.com matches because the pattern does not enforce that the domain cannot start with a hyphen. The strict pattern above does enforce it.
Leading or trailing whitespace: stripped on most form inputs before validation, but if the input is " alice@example.com " and you skip trim(), the anchors will reject it correctly.

The fix for the first two is to use the strict pattern or to add a follow-up DNS / SMTP verification step. For high-volume signup forms, the right answer is almost always:

Run the practical regex for instant client-side feedback.
Send a confirmation email and only treat the address as verified once the user clicks the link in it.

The regex catches typos; the email round-trip catches everything else.

Validating the email domain's TLD with the Public Suffix List

The pattern in this article accepts alice@example.zzz because the TLD class [A-Za-z]{2,} matches any run of letters. A regex cannot know which TLDs are real. There are more than 1,500 of them, ICANN adds new ones regularly, and any list you hardcode into a pattern goes stale. That produces inaccurate validation in both directions: false rejections (turning away a genuine address on a newer TLD like .dev or .app because the pattern predates it) and false accepts (passing alice@typo.cmo or an invented TLD straight through to your mailer, where it bounces).

For the domain half of the address, everything after the @, I check the suffix against the Public Suffix List (PSL): a community-maintained, continuously updated registry of every public domain suffix, from plain TLDs (.com, .io) through multi-level suffixes (.co.uk, .com.au). It is the same list browsers use for cookie scoping. Run the regex first for shape, then check the domain's suffix against the PSL so an invented or mistyped TLD is caught before you attempt delivery.

I maintain a small PSL parser of my own for zero-dependency projects, but a community-maintained library is usually the better choice. It tracks the upstream list and handles the punycode and wildcard rules. The widely used ones by language:

Language	Library
Node.js / JavaScript	`psl`, `tldts`
Python	`tldextract`, `publicsuffix2`
PHP	`jeremykendall/php-domain-parser`
Ruby	`public_suffix`
Go	`golang.org/x/net/publicsuffix`

Layering the PSL onto the email regex in Node:

javascript

const psl = require("psl");

function emailDomainHasRealTld(email) {
  if (!isValidEmail(email)) return false;      // regex: address shape
  const domain = email.split("@")[1];
  const parsed = psl.parse(domain);            // PSL: real public suffix?
  return !parsed.error && parsed.listed;
}

My recommendation is to combine all three layers for a high probability of correct validation: the regex for instant address shape, the PSL for an authoritative TLD check, and the confirmation-email round-trip for proof the mailbox actually exists. The full domain-side detail, including the same library list, is in matching a domain name with regex.

Common mistakes

The bugs I see in code review, and the fix for each.

Forgetting the anchors. A pattern without ^ and $ accepts "some garbage alice@example.com more garbage" because the engine finds a substring match. For validation, always anchor both ends.

Unescaped dot in the TLD position. A bare . instead of \. matches any character, so aliceXexample.com slips through. Always escape the literal dot.

Forgetting the + in Gmail addresses. A local-part class like [A-Za-z0-9._-]+ (no +) silently rejects alice+marketing@gmail.com. Gmail tagging is common; include + in the local-part class.

Capping the TLD at 4 characters. Old patterns used [A-Za-z]{2,4}$ which rejects .museum, .london, .amazon, and every modern brand TLD. Use {2,} or {2,63}.

Trusting client-side validation alone. A user with DevTools can submit anything. Re-validate on the server. The regex is a typo-catcher, not a security control.

Validating before trimming. A pattern anchored with ^...$ rejects " alice@example.com " because of the whitespace. Trim the input first, then validate.

Test cases: matches and non-matches

Input	Practical pattern	Notes
`alice@example.com`	Match	Standard
`a@b.co`	Match	Minimum size
`alice+tag@gmail.com`	Match	Gmail tagging
`Alice.Smith@Example.com`	Match	Case is allowed
`alice@example`	No match	No TLD
`@example.com`	No match	Empty local part
`alice@.example.com`	No match	Leading dot in domain
`alice example.com`	No match	Space, no `@`
`alice@example.c`	No match	TLD too short (1 letter)

FAQ

The practical pattern ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$ handles 99% of real-world cases and is what most production code uses.

If you need RFC 5321 compliance, use the stricter pattern shown above. If you need absolute certainty, send a confirmation email. The regex catches typos, the email round-trip catches everything else.

Not really. The RFC permits constructs like quoted strings, IP-literal domains (alice@[192.0.2.1]), and obsolete formats that almost no email system actually accepts in practice.

Anyone serious about full RFC compliance writes a parser, not a regex. For application code, the practical pattern plus a confirmation email is what production teams ship.

Because the character class [._%+-] allows any of those characters one after another. The RFC forbids consecutive dots, but the simple character-class form does not encode that rule.

To reject alice..bob@example.com, use a pattern that matches dot-separated atoms: ^[A-Za-z0-9_%+-]+(?:\.[A-Za-z0-9_%+-]+)*@.... The strict pattern in this article includes that form.

In PHP, filter_var($value, FILTER_VALIDATE_EMAIL) is the safest default. It uses an internal validator that follows most of RFC 5322 and is maintained by PHP's core team.

Use a regex when you need to enforce something stricter (no plus-tagged addresses, no role accounts, a specific allowed-domain list) or something more lenient (accept educational-institution edge cases). For "is this email-shaped", filter_var wins.

Yes. The pattern uses only universal regex features: anchors, character classes, and quantifiers. It works in JavaScript, Python, PHP (preg_match), Go (regexp), Java, .NET, and Ruby with no changes.

Some engines have richer features (Python's named groups, PCRE's possessive quantifiers, etc.) but the practical email pattern does not need any of them.

Replace the domain part with a literal: ^[A-Za-z0-9._%+-]+@example\.com$. For multiple allowed domains use alternation: ^[A-Za-z0-9._%+-]+@(example\.com|other\.com)$.

For a list of dozens of allowed domains, a regex gets unwieldy. Run the practical regex first to confirm the input is email-shaped, then check the domain part against an array.

No. The practical pattern is ASCII-only. Internationalised mailbox names (e.g., 用户@example.com) use Unicode and need a Unicode-aware character class in engines that support Unicode properties (PCRE, .NET, Python with the third-party regex package, Ruby, Java).

For the domain part, internationalised domains get encoded as Punycode before DNS lookup. See the domain matching guide for the encoding details.

How to Match an Email Address with Regex

Quick reference

The practical pattern

The strict pattern (RFC 5321)

Examples in JavaScript, Python, and PHP

Engine compatibility

Edge cases the practical pattern handles correctly

What still gets through (and what to do about it)

Validating the email domain's TLD with the Public Suffix List

Common mistakes

Test cases: matches and non-matches

FAQ

See also

Sources

Ishan Karunaratne

Related posts

How to Match an IPv4 and IPv6 Address with Regex

How to Match a Domain Name with Regex

How to Match Numbers with Regex

What is the best regex to validate an email address?

Can a regex fully validate an email address per RFC 5321?

Why does my regex accept emails with double dots?

Should I use a regex or PHP's filter_var?

Does the practical pattern work in JavaScript, Python, and PHP?

How do I match an email from a specific domain only?

Does the pattern handle internationalised email addresses?

Sources

Ishan Karunaratne