TechEarl

How to Match an Email Address with Regex

Match an email address with regex. The practical pattern, the strict RFC 5321 pattern, examples in JavaScript, Python, and PHP, edge cases, engine compatibility, common mistakes, and a validation test table.

Ishan KarunaratneIshan Karunaratne⏱️ 10 min readUpdated
Match an email address with regex. Practical pattern, strict RFC 5321 pattern, JavaScript / Python / PHP examples, edge cases, engine compatibility, common mistakes, and a test table.

The practical regex for matching an email address that handles 99% of real-world inputs: ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$. It accepts standard mailbox names (letters, digits, dots, underscores, percent, plus, hyphen), an @, a domain with at least one dot, and a top-level domain of two or more letters. There is also a stricter pattern that follows RFC 5321 to the letter, but in most production code the practical one is what I reach for. Below I walk both, plus the runnable code in JavaScript, Python, and PHP, the gotchas per regex flavor, and the bugs I've actually shipped.

The reason there is no single "perfect" email regex is that the RFC technically allows things like "quoted strings"@example.com and _underscore_@example.com that almost no real form ever wants. Tightening the regex to reject them is more useful than chasing every edge of the spec.

Quick reference

The practical pattern, ready to paste:

code
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$

The same pattern with the TLD length capped at 63 characters (RFC 1035):

code
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,63}$

The lazy "is this email-shaped" pattern, suitable for log scraping but not validation:

code
^\S+@\S+\.\S+$

Allow-list a single provider:

code
^[A-Za-z0-9._%+-]+@(gmail|googlemail)\.com$

The practical pattern

code
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$

Breaking it down left to right:

  • ^ and $ anchor the pattern to the full string. Without them, an input like not.an.email some@thing.com extra would match anyway.
  • [A-Za-z0-9._%+-]+ is the local part (everything before the @). One or more characters from the allowed set. + is included because of Gmail's you+tag@gmail.com convention.
  • @ is the literal at-sign.
  • [A-Za-z0-9.-]+ is the domain name. One or more characters.
  • \. is the literal dot before the TLD. Escaping it matters because an unescaped . matches any character.
  • [A-Za-z]{2,} is the top-level domain: at least two letters. Allows .co, .com, .museum, .london (any modern TLD).

This pattern works with no engine-specific flags. Anchors and character classes are universal.

The strict pattern (RFC 5321)

If you genuinely need to reject every input the RFC forbids, the pattern is much longer:

code
^[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?\.)+[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?$

This allows all the special characters the RFC permits in the local part, requires the domain name to start and end with alphanumeric (no leading or trailing hyphen), and allows multi-level subdomains like email@mail.support.example.com. Use it when the validation has compliance implications. For login forms, use the practical pattern.

Examples in JavaScript, Python, and PHP

JavaScript:

javascript
const emailPattern = /^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$/;
function isValidEmail(input) {
  return emailPattern.test(input);
}
isValidEmail("alice@example.com");        // true
isValidEmail("alice+marketing@gmail.com"); // true
isValidEmail("alice@example");            // false (no TLD)

Python:

python
import re
EMAIL_RE = re.compile(r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$")

def is_valid_email(value: str) -> bool:
    return bool(EMAIL_RE.match(value))

is_valid_email("alice@example.com")        # True
is_valid_email("alice@.example.com")       # False (leading dot in domain)

PHP:

php
function isValidEmail(string $value): bool {
    return (bool) preg_match('/^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$/', $value);
}

isValidEmail("alice@example.com");          // true
isValidEmail("alice@example.co.uk");        // true (multi-part TLD works)

Notice that PHP also has filter_var($value, FILTER_VALIDATE_EMAIL) which uses an internal validator close to RFC 5322. I prefer it for minimum-fuss email validation in PHP and reach for the regex when I need a stricter or more lenient subset.

Engine compatibility

The practical pattern uses only universal regex features (anchors, character classes, quantifiers). It runs unmodified everywhere. The per-engine notes are about the wider toolkit you reach for when the simple pattern is not enough.

EnginePractical patternPer-engine note
JavaScriptWorksnew URL() does not parse emails. For deeper validation use a library like email-validator or validator.js.
Python (re)WorksThe stdlib email.utils.parseaddr is a more permissive RFC-aware parser. Combine with the regex for typo detection.
Python (regex pkg)WorksSupports Unicode letter classes if you want to accept internationalised mailbox names.
PHP (PCRE)Worksfilter_var($v, FILTER_VALIDATE_EMAIL) follows RFC 5322 and is the production default.
JavaWorksjavax.mail.internet.InternetAddress.parse(s, true) is the closest stdlib RFC-compliant parser.
.NETWorksSystem.Net.Mail.MailAddress parses to RFC 5322 and throws on invalid input.
Go (RE2)Worksnet/mail.ParseAddress is the parser equivalent. RE2 does not support lookaheads, so the strict pattern above must be rewritten.
Rust (regex crate)WorksNo lookaheads, no backreferences. The strict pattern needs an alternative implementation.
RubyWorksURI::MailTo::EMAIL_REGEXP (stdlib) is similar to the practical pattern shown here.
POSIX ERE (grep -E)WorksNo \d shorthand. Use [0-9] instead.

For cross-language validation where the same pattern runs on the frontend and the backend, keep to the practical form. The strict version uses non-capturing groups ((?:...)) which most modern engines accept, but a few legacy POSIX tools do not.

Edge cases the practical pattern handles correctly

InputMatchWhy
alice+marketing@gmail.comYes+ allowed in local part (Gmail tagging)
first.last@example.co.ukYesDots in local part, multi-part TLD
user_name@sub.example.comYesSubdomain works
1234@example.ioYesAll-numeric local part is legal
alice@example.museumYesLong TLDs work (any 2+ letters)

What still gets through (and what to do about it)

The practical pattern accepts a few inputs that look valid but are technically wrong:

  • Consecutive dots in the local part: alice..bob@example.com matches because [._%+-]+ allows consecutive dots. RFC 5321 forbids this.
  • Domain hyphen position: alice@-example.com matches because the pattern does not enforce that the domain cannot start with a hyphen. The strict pattern above does enforce it.
  • Leading or trailing whitespace: stripped on most form inputs before validation, but if the input is " alice@example.com " and you skip trim(), the anchors will reject it correctly.

The fix for the first two is to use the strict pattern or to add a follow-up DNS / SMTP verification step. For high-volume signup forms, the right answer is almost always:

  1. Run the practical regex for instant client-side feedback.
  2. Send a confirmation email and only treat the address as verified once the user clicks the link in it.

The regex catches typos; the email round-trip catches everything else.

Common mistakes

The bugs I see in code review, and the fix for each.

Forgetting the anchors. A pattern without ^ and $ accepts "some garbage alice@example.com more garbage" because the engine finds a substring match. For validation, always anchor both ends.

Unescaped dot in the TLD position. A bare . instead of \. matches any character, so aliceXexample.com slips through. Always escape the literal dot.

Forgetting the + in Gmail addresses. A local-part class like [A-Za-z0-9._-]+ (no +) silently rejects alice+marketing@gmail.com. Gmail tagging is common; include + in the local-part class.

Capping the TLD at 4 characters. Old patterns used [A-Za-z]{2,4}$ which rejects .museum, .london, .amazon, and every modern brand TLD. Use {2,} or {2,63}.

Trusting client-side validation alone. A user with DevTools can submit anything. Re-validate on the server. The regex is a typo-catcher, not a security control.

Validating before trimming. A pattern anchored with ^...$ rejects " alice@example.com " because of the whitespace. Trim the input first, then validate.

Test cases: matches and non-matches

InputPractical patternNotes
alice@example.comMatchStandard
a@b.coMatchMinimum size
alice+tag@gmail.comMatchGmail tagging
Alice.Smith@Example.comMatchCase is allowed
alice@exampleNo matchNo TLD
@example.comNo matchEmpty local part
alice@.example.comNo matchLeading dot in domain
alice example.comNo matchSpace, no @
alice@example.cNo matchTLD too short (1 letter)

FAQ

See also

External reference: paste the pattern into regex101.com to test it interactively against your own input strings. The site also explains every token in the pattern.

TagsRegexEmail ValidationRegular ExpressionsJavaScriptPythonPHPValidation
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years across software, Linux systems, DevOps, and infrastructure — and a more recent focus on AI. Currently Chief Technology Officer at a tech startup in the healthcare space.

Keep reading

Related posts

Match a domain name with regex. Basic labels, RFC 1035 length rules, subdomains, IDN punycode, trailing-dot form, JavaScript / Python / PHP examples, engine notes, and common mistakes.

How to Match a Domain Name with Regex

Match a domain name with regex. Basic labels, RFC 1035 length rules, subdomains, IDN punycode, trailing-dot form, JavaScript / Python / PHP examples, engine notes, and common mistakes.

Match integers, decimals, signed, scientific, thousands-separated, currency, and percent numbers with regex. JavaScript / Python / PHP examples, engine notes, common mistakes, test table.

How to Match Numbers with Regex

Match integers, decimals, signed, scientific, thousands-separated, currency, and percent numbers with regex. JavaScript / Python / PHP examples, engine notes, common mistakes, test table.