Match a Domain Name with Regex: RFC 1035 and IDN-Aware (2026)

The practical regex for matching a domain name: ^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$. It accepts example.com, sub.example.co.uk, and mail.example-domain.io while rejecting example (no TLD), -example.com (leading hyphen), and example.c (TLD too short). For Internationalised Domain Names like 例え.テスト, match the punycode form (xn--r8jz45g.xn--zckzah) using the same character class and the same rules. Below I walk the basic pattern, the strict RFC 1035 form with full length checks, runnable code in JavaScript, Python, and PHP, engine notes, the common bugs, and the case where the right call is to skip regex and run a DNS check.

The reason this comes up so often is that domain validation lives in two places: form inputs (where users type their company domain) and log scanning (where you extract every domain mentioned in a stream of text). The same pattern works for both, with anchors for full-string match in forms and unanchored in scans.

Quick reference

The practical pattern, ready to paste:

code

^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$

The same pattern with the 255-character total length enforced:

code

^(?=.{1,255}$)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$

Single-label hostname (e.g., localhost, printer-01):

code

^[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?$

The practical pattern

code

^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$

Breaking it down:

^ and $ anchor to the full string for form validation. Drop them for log scanning.
[a-zA-Z0-9] means a label cannot start with a hyphen.
(?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])? allows middle characters including hyphens, but the last character cannot be a hyphen. Together with the first character rule, this enforces "no leading or trailing hyphen". The 61 limits the middle so the total label length stays at most 63.
\. is a literal dot.
The whole label pattern is repeated with + so you can have multiple labels (mail.example.co.uk).
[a-zA-Z]{2,63} is the TLD: letters only, between 2 and 63 characters. Modern TLDs include .museum, .london, .amazon.

This pattern rejects all-numeric TLDs (which don't exist in real DNS) and localhost (single-label). To accept single-label hostnames for internal use, see the variant in the Quick reference above.

The strict pattern (RFC 1035 length limits)

RFC 1035 imposes two length rules: each label is at most 63 octets, and the entire domain (including dots) is at most 255 octets. Regex can enforce the per-label limit naturally; the total length is easier to check with a lookahead.

code

^(?=.{1,255}$)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$

The added piece is (?=.{1,255}$), a lookahead that asserts the entire string is between 1 and 255 characters. Combined with the rest, this enforces both per-label and total-length limits.

For more on lookahead patterns like this, see how to use regex lookaheads and lookbehinds.

Subdomains and depth control

If you want to limit how deep the subdomain tree can go (for example, allow only something.example.com, not a.b.c.example.com):

code

Exactly one subdomain:        ^[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.[a-zA-Z]{2,63}$
Up to 3 subdomains:           ^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.){1,3}[a-zA-Z]{2,63}$
Just root domain (no subs):   ^[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.[a-zA-Z]{2,63}$

The {1,3} quantifier on the label group controls how many "name." segments precede the TLD. Use this when your application has a specific subdomain policy.

Internationalised domain names (IDN / punycode)

A domain like 例え.テスト is internationalised. DNS itself does not speak Unicode. Internationalised domains get encoded as punycode (xn--r8jz45g.xn--zckzah) for the wire format. The good news: punycode is ASCII-safe and the same regex matches it.

code

^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$

Try it against xn--r8jz45g.xn--zckzah. It matches because punycode uses only a-z, 0-9, and hyphen.

If you need to accept the Unicode form directly in user input (and convert to punycode later), Python provides idna.encode(), Node has url.domainToASCII(), and PHP has idn_to_ascii(). Run the input through the IDN encoder first, then validate the punycode result with the regex.

Examples in JavaScript, Python, and PHP

JavaScript:

javascript

const domainPattern = /^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$/;

function isValidDomain(input) {
  if (input.length > 255) return false;
  return domainPattern.test(input);
}

isValidDomain("example.com");              // true
isValidDomain("mail.example.co.uk");       // true
isValidDomain("xn--r8jz45g.xn--zckzah");   // true (punycode IDN)
isValidDomain("-example.com");             // false (leading hyphen)
isValidDomain("example");                  // false (no TLD)

// For Unicode input, encode first
const punycode = require("url").domainToASCII("例え.テスト");
isValidDomain(punycode);                   // true

Python:

python

import re

DOMAIN_RE = re.compile(
    r"^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$"
)

def is_valid_domain(value: str) -> bool:
    if len(value) > 255:
        return False
    return bool(DOMAIN_RE.match(value))

is_valid_domain("example.com")        # True
is_valid_domain("mail.sub.co.uk")     # True
is_valid_domain("example..com")       # False (empty label)

# For Unicode input, use the idna package
import idna
punycode = idna.encode("例え.テスト").decode("ascii")
is_valid_domain(punycode)             # True

PHP:

php

function isValidDomain(string $value): bool {
    if (strlen($value) > 255) return false;
    $pattern = '/^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$/';
    return (bool) preg_match($pattern, $value);
}

isValidDomain("example.com");          // true
isValidDomain("xn--r8jz45g.xn--zckzah"); // true

// For Unicode input
$punycode = idn_to_ascii("例え.テスト", IDNA_DEFAULT, INTL_IDNA_VARIANT_UTS46);
isValidDomain($punycode);              // true

Engine compatibility

The strict-length variant uses a lookahead (?=.{1,255}$). Most engines support it; a few do not. The practical pattern is lookahead-free and runs everywhere.

Engine	Practical pattern	Strict (lookahead)	IDN encoder in stdlib
JavaScript	Works	Works	`url.domainToASCII` (Node), or the `punycode` package
Python (`re`)	Works	Works	`encodings.idna` or third-party `idna`
PHP (PCRE)	Works	Works	`idn_to_ascii` (intl extension)
Java	Works	Works	`java.net.IDN.toASCII`
.NET	Works	Works	`System.Globalization.IdnMapping`
Go (RE2)	Works	Not supported	`golang.org/x/net/idna`
Rust (`regex` crate)	Works	Not supported	`idna` crate
Ruby	Works	Works	`Addressable::URI` (gem)
POSIX ERE (`grep -E`)	Works (no `\d`/`\w`)	Not supported	None

For Go and Rust, where lookahead is unavailable, enforce the total 255-character length in code as a separate check after the regex passes (see the JavaScript example above).

When to skip regex and use a DNS check

A regex confirms the string is shaped like a domain. It does not confirm the domain exists, resolves, or points anywhere. For applications where that matters (webhook URLs, OAuth callbacks, transactional email domains), pair the regex with a DNS lookup.

JavaScript (Node):

javascript

const dns = require("node:dns").promises;

async function domainExists(domain) {
  if (!isValidDomain(domain)) return false;
  try {
    await dns.resolve(domain);
    return true;
  } catch {
    return false;
  }
}

Python:

python

import socket

def domain_exists(domain: str) -> bool:
    if not is_valid_domain(domain):
        return False
    try:
        socket.gethostbyname(domain)
        return True
    except socket.gaierror:
        return False

For deeper checks (does this domain have valid MX records, DNSSEC, working SPF and DKIM, a current SSL certificate), the DNS Inspector at dnschkr.com runs 25+ automated tests against a domain in one request and returns a health score. It's also what I cover in detail in the DNS health check walkthrough on this site.

Validating the TLD with the Public Suffix List

A regex confirms a string is shaped like a domain. It does not confirm the top-level domain is real. example.zzz and example.madeupword both pass every pattern in this article, because [a-zA-Z]{2,63} accepts any run of letters in the TLD position. The set of valid TLDs is not something a regex can encode honestly: there are more than 1,500 of them, ICANN adds new gTLDs regularly, and brand TLDs come and go. A pattern baked with last year's TLD list produces inaccurate validation in two directions at once. It causes false rejections (turning away a genuine address on a newer TLD like .dev or .africa because the hardcoded list predates it) and false accepts (passing an invented TLD like .zzz straight through to code that then tries to use it).

The fix is the Public Suffix List (PSL): a community-maintained registry of every public domain suffix, from plain TLDs (.com, .io) through multi-level suffixes (.co.uk, .com.au, .github.io). It is the same list browsers use to decide cookie scope, and it is updated continuously as registries change. When I write domain-validation code, I run the regex first for shape, then check the candidate's suffix against the PSL so a typo'd or invented TLD is caught before it reaches anything that matters.

I keep a small PSL parser of my own for projects where I want zero dependencies, but most of the time a community-maintained library is the better call. It tracks the upstream list, handles the punycode and wildcard rules, and is well tested. The widely used ones by language:

Language	Library
Node.js / JavaScript	`psl`, `tldts`
Python	`tldextract`, `publicsuffix2`
PHP	`jeremykendall/php-domain-parser`
Ruby	`public_suffix`
Go	`golang.org/x/net/publicsuffix`

A quick shape-plus-suffix check in Node, layering the PSL on top of the regex from earlier:

javascript

const psl = require("psl");

function isValidDomainWithRealTld(input) {
  if (!isValidDomain(input)) return false;     // regex: shape
  const parsed = psl.parse(input);             // PSL: real public suffix?
  return !parsed.error && parsed.listed;
}

My recommendation is to use a combination rather than picking one. The regex gives instant, dependency-free shape validation; the PSL gives an authoritative, current TLD check; together they catch far more than either alone, with no network latency. Reserve the DNS check for when you also need to know the domain actually resolves. The same layering applies to the domain half of an email address, covered in matching an email address with regex.

Common mistakes

The bugs I see most often, with the fix for each.

Allowing labels to start or end with a hyphen. A naive [a-zA-Z0-9-]+ accepts -bad.com and bad-.com. The practical pattern in this article uses a first-character class without -, a middle class with -, and a last-character class without -, which enforces the rule.

Accepting all-numeric labels in the TLD position. 123.456 parses as "shape OK" but no real TLD is numeric. The pattern uses [a-zA-Z]{2,63} for the TLD, which excludes digits.

Capping the TLD at 4 or 6 characters. Old patterns used [a-zA-Z]{2,4} which rejects .museum, .amazon, .travel, and .london. Use {2,63} to match the RFC limit.

Forgetting punycode is the wire format. Unicode domains (例え.テスト) never appear on the DNS query. They're encoded to xn--... before the lookup. Validate the encoded form, or run the IDN encoder first and then validate.

Trying to match the leading-zero IPv4-as-domain case. 192.168.1.1 matches the practical domain pattern because each octet is alphanumeric. If your input might be an IP literal, route it through the IP regex first (see IPv4 and IPv6 matching).

Treating a regex pass as proof the domain resolves. It doesn't. Pair the regex with a DNS lookup whenever "is this real" matters.

Test cases

Input	Practical pattern	RFC 1035 strict
`example.com`	Match	Match
`mail.example.co.uk`	Match	Match
`sub.sub.sub.example.com`	Match	Match
`xn--r8jz45g.xn--zckzah`	Match	Match
`example-domain.io`	Match	Match
`123abc.com`	Match	Match
`-example.com`	No match	No match
`example-.com`	No match	No match
`example..com`	No match	No match
`example`	No match	No match
`example.c`	No match	No match
`a.b`	No match	No match
`localhost`	No match	No match

FAQ

The shortest correct form: ^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$. It enforces no leading or trailing hyphens, per-label length up to 63, and a TLD of letters only between 2 and 63 characters.

For RFC 1035 compliance, add the total-length lookahead (?=.{1,255}$) at the start so the whole string can't exceed 255 characters.

Yes if you first convert it to punycode. The DNS wire format encodes 例え.テスト as xn--r8jz45g.xn--zckzah, which is pure ASCII (letters, digits, hyphens) and matches the standard domain regex.

In JavaScript use url.domainToASCII(), in Python use idna.encode(), in PHP use idn_to_ascii(). Validate the encoded form, not the Unicode source.

Use + on the label group: ^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$. The + means one or more "name." segments, so the pattern matches both example.com and a.b.c.example.com.

To limit depth, replace + with {1,3} for "1 to 3 subdomain segments". To require exactly one subdomain, write the label group out twice explicitly.

No. The pattern requires at least one dot and a TLD of 2+ letters, which rules out single-label hostnames like localhost, my-server, or printer-01.

For internal hostnames, use a separate, looser pattern: ^[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?$. Or accept both forms with alternation.

No. The regex only confirms the string is shaped like a domain. It does not check whether the domain is registered, has DNS records, or points anywhere reachable.

For "is this domain real", pair the regex with a DNS lookup (dns.resolve() in Node, socket.gethostbyname() in Python). For "is this domain configured correctly", use a tool like the DNS Inspector at dnschkr.com which runs 25+ automated tests and returns a health score.

The practical pattern accidentally matches dotted-quad IPs like 192.168.1.1 because each octet is alphanumeric. To reject, add a negative lookahead at the start: ^(?!(?:\d{1,3}\.){3}\d{1,3}$)(?:...), which fails if the whole input is four numeric octets.

For Go and Rust where lookahead is unavailable, run the IP regex first and short-circuit before the domain regex.

Strictly fully-qualified domain names end with a trailing dot (example.com.). DNS resolvers normalise this on input, so most validation patterns omit it. To accept it, append \.? at the end of the pattern: ...[a-zA-Z]{2,63}\.?$.

How to Match a Domain Name with Regex

Quick reference

The practical pattern

The strict pattern (RFC 1035 length limits)

Subdomains and depth control

Internationalised domain names (IDN / punycode)

Examples in JavaScript, Python, and PHP

Engine compatibility

When to skip regex and use a DNS check

Validating the TLD with the Public Suffix List

Common mistakes

Test cases

FAQ

See also

Sources

Ishan Karunaratne

Related posts

How to Match an Email Address with Regex

How to Match Numbers with Regex

How to Match a URL with Regex

What is the simplest regex for a domain name?

Can regex match an internationalised domain like 例え.テスト?

How do I match a domain regardless of subdomain depth?

Does the domain regex accept localhost?

Does a passing regex mean the domain actually exists?

How do I reject IP addresses with the domain regex?

Should the trailing dot in fully-qualified domains match?

Sources

Ishan Karunaratne