The practical regex for matching an email address that handles 99% of real-world inputs: ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$. It accepts standard mailbox names (letters, digits, dots, underscores, percent, plus, hyphen), an @, a domain with at least one dot, and a top-level domain of two or more letters. There is also a stricter pattern that follows RFC 5321 to the letter, but in most production code the practical one is what I reach for. Below I walk both, plus the runnable code in JavaScript, Python, and PHP, the gotchas per regex flavor, and the bugs I've actually shipped.
The reason there is no single "perfect" email regex is that the RFC technically allows things like "quoted strings"@example.com and _underscore_@example.com that almost no real form ever wants. Tightening the regex to reject them is more useful than chasing every edge of the spec.
Quick reference
The practical pattern, ready to paste:
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$
The same pattern with the TLD length capped at 63 characters (RFC 1035):
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,63}$
The lazy "is this email-shaped" pattern, suitable for log scraping but not validation:
^\S+@\S+\.\S+$
Allow-list a single provider:
^[A-Za-z0-9._%+-]+@(gmail|googlemail)\.com$
The practical pattern
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$
Breaking it down left to right:
^and$anchor the pattern to the full string. Without them, an input likenot.an.email some@thing.com extrawould match anyway.[A-Za-z0-9._%+-]+is the local part (everything before the@). One or more characters from the allowed set.+is included because of Gmail'syou+tag@gmail.comconvention.@is the literal at-sign.[A-Za-z0-9.-]+is the domain name. One or more characters.\.is the literal dot before the TLD. Escaping it matters because an unescaped.matches any character.[A-Za-z]{2,}is the top-level domain: at least two letters. Allows.co,.com,.museum,.london(any modern TLD).
This pattern works with no engine-specific flags. Anchors and character classes are universal.
The strict pattern (RFC 5321)
If you genuinely need to reject every input the RFC forbids, the pattern is much longer:
^[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?\.)+[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?$
This allows all the special characters the RFC permits in the local part, requires the domain name to start and end with alphanumeric (no leading or trailing hyphen), and allows multi-level subdomains like email@mail.support.example.com. Use it when the validation has compliance implications. For login forms, use the practical pattern.
Examples in JavaScript, Python, and PHP
JavaScript:
const emailPattern = /^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$/;
function isValidEmail(input) {
return emailPattern.test(input);
}
isValidEmail("alice@example.com"); // true
isValidEmail("alice+marketing@gmail.com"); // true
isValidEmail("alice@example"); // false (no TLD)Python:
import re
EMAIL_RE = re.compile(r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$")
def is_valid_email(value: str) -> bool:
return bool(EMAIL_RE.match(value))
is_valid_email("alice@example.com") # True
is_valid_email("alice@.example.com") # False (leading dot in domain)PHP:
function isValidEmail(string $value): bool {
return (bool) preg_match('/^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$/', $value);
}
isValidEmail("alice@example.com"); // true
isValidEmail("alice@example.co.uk"); // true (multi-part TLD works)Notice that PHP also has filter_var($value, FILTER_VALIDATE_EMAIL) which uses an internal validator close to RFC 5322. I prefer it for minimum-fuss email validation in PHP and reach for the regex when I need a stricter or more lenient subset.
Engine compatibility
The practical pattern uses only universal regex features (anchors, character classes, quantifiers). It runs unmodified everywhere. The per-engine notes are about the wider toolkit you reach for when the simple pattern is not enough.
| Engine | Practical pattern | Per-engine note |
|---|---|---|
| JavaScript | Works | new URL() does not parse emails. For deeper validation use a library like email-validator or validator.js. |
Python (re) | Works | The stdlib email.utils.parseaddr is a more permissive RFC-aware parser. Combine with the regex for typo detection. |
Python (regex pkg) | Works | Supports Unicode letter classes if you want to accept internationalised mailbox names. |
| PHP (PCRE) | Works | filter_var($v, FILTER_VALIDATE_EMAIL) follows RFC 5322 and is the production default. |
| Java | Works | javax.mail.internet.InternetAddress.parse(s, true) is the closest stdlib RFC-compliant parser. |
| .NET | Works | System.Net.Mail.MailAddress parses to RFC 5322 and throws on invalid input. |
| Go (RE2) | Works | net/mail.ParseAddress is the parser equivalent. RE2 does not support lookaheads, so the strict pattern above must be rewritten. |
Rust (regex crate) | Works | No lookaheads, no backreferences. The strict pattern needs an alternative implementation. |
| Ruby | Works | URI::MailTo::EMAIL_REGEXP (stdlib) is similar to the practical pattern shown here. |
POSIX ERE (grep -E) | Works | No \d shorthand. Use [0-9] instead. |
For cross-language validation where the same pattern runs on the frontend and the backend, keep to the practical form. The strict version uses non-capturing groups ((?:...)) which most modern engines accept, but a few legacy POSIX tools do not.
Edge cases the practical pattern handles correctly
| Input | Match | Why |
|---|---|---|
alice+marketing@gmail.com | Yes | + allowed in local part (Gmail tagging) |
first.last@example.co.uk | Yes | Dots in local part, multi-part TLD |
user_name@sub.example.com | Yes | Subdomain works |
1234@example.io | Yes | All-numeric local part is legal |
alice@example.museum | Yes | Long TLDs work (any 2+ letters) |
What still gets through (and what to do about it)
The practical pattern accepts a few inputs that look valid but are technically wrong:
- Consecutive dots in the local part:
alice..bob@example.commatches because[._%+-]+allows consecutive dots. RFC 5321 forbids this. - Domain hyphen position:
alice@-example.commatches because the pattern does not enforce that the domain cannot start with a hyphen. The strict pattern above does enforce it. - Leading or trailing whitespace: stripped on most form inputs before validation, but if the input is
" alice@example.com "and you skiptrim(), the anchors will reject it correctly.
The fix for the first two is to use the strict pattern or to add a follow-up DNS / SMTP verification step. For high-volume signup forms, the right answer is almost always:
- Run the practical regex for instant client-side feedback.
- Send a confirmation email and only treat the address as verified once the user clicks the link in it.
The regex catches typos; the email round-trip catches everything else.
Common mistakes
The bugs I see in code review, and the fix for each.
Forgetting the anchors. A pattern without ^ and $ accepts "some garbage alice@example.com more garbage" because the engine finds a substring match. For validation, always anchor both ends.
Unescaped dot in the TLD position. A bare . instead of \. matches any character, so aliceXexample.com slips through. Always escape the literal dot.
Forgetting the + in Gmail addresses. A local-part class like [A-Za-z0-9._-]+ (no +) silently rejects alice+marketing@gmail.com. Gmail tagging is common; include + in the local-part class.
Capping the TLD at 4 characters. Old patterns used [A-Za-z]{2,4}$ which rejects .museum, .london, .amazon, and every modern brand TLD. Use {2,} or {2,63}.
Trusting client-side validation alone. A user with DevTools can submit anything. Re-validate on the server. The regex is a typo-catcher, not a security control.
Validating before trimming. A pattern anchored with ^...$ rejects " alice@example.com " because of the whitespace. Trim the input first, then validate.
Test cases: matches and non-matches
| Input | Practical pattern | Notes |
|---|---|---|
alice@example.com | Match | Standard |
a@b.co | Match | Minimum size |
alice+tag@gmail.com | Match | Gmail tagging |
Alice.Smith@Example.com | Match | Case is allowed |
alice@example | No match | No TLD |
@example.com | No match | Empty local part |
alice@.example.com | No match | Leading dot in domain |
alice example.com | No match | Space, no @ |
alice@example.c | No match | TLD too short (1 letter) |
FAQ
See also
- How to Match a URL with Regex: the cousin pattern that also lives on signup forms
- How to Match a Domain Name with Regex: for validating just the part after the
@ - How to Validate a Strong Password with Regex: the other field on every signup form
- Regex Anchors: why
^and$are non-negotiable for validation patterns - Regex Lookaheads and Lookbehinds: for the strict pattern's
(?=...)and(?!...)constructs - Regex Capturing Groups and Backreferences: extract the local part and domain separately
- Regex Cheat Sheet: the wider syntax and engine compatibility reference
External reference: paste the pattern into regex101.com to test it interactively against your own input strings. The site also explains every token in the pattern.





