TechEarl

How to Match a Domain Name with Regex

Match a domain name with regex. Basic labels, RFC 1035 length rules, subdomains, IDN punycode, trailing-dot form, JavaScript / Python / PHP examples, engine notes, and common mistakes.

Ishan KarunaratneIshan Karunaratne⏱️ 10 min readUpdated
Match a domain name with regex. Basic labels, RFC 1035 length rules, subdomains, IDN punycode, trailing-dot form, JavaScript / Python / PHP examples, engine notes, and common mistakes.

The practical regex for matching a domain name: ^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$. It accepts example.com, sub.example.co.uk, and mail.example-domain.io while rejecting example (no TLD), -example.com (leading hyphen), and example.c (TLD too short). For Internationalised Domain Names like 例え.テスト, match the punycode form (xn--r8jz45g.xn--zckzah) using the same character class and the same rules. Below I walk the basic pattern, the strict RFC 1035 form with full length checks, runnable code in JavaScript, Python, and PHP, engine notes, the common bugs, and the case where the right call is to skip regex and run a DNS check.

The reason this comes up so often is that domain validation lives in two places: form inputs (where users type their company domain) and log scanning (where you extract every domain mentioned in a stream of text). The same pattern works for both, with anchors for full-string match in forms and unanchored in scans.

Quick reference

The practical pattern, ready to paste:

code
^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$

The same pattern with the 255-character total length enforced:

code
^(?=.{1,255}$)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$

Single-label hostname (e.g., localhost, printer-01):

code
^[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?$

The practical pattern

code
^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$

Breaking it down:

  • ^ and $ anchor to the full string for form validation. Drop them for log scanning.
  • [a-zA-Z0-9] means a label cannot start with a hyphen.
  • (?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])? allows middle characters including hyphens, but the last character cannot be a hyphen. Together with the first character rule, this enforces "no leading or trailing hyphen". The 61 limits the middle so the total label length stays at most 63.
  • \. is a literal dot.
  • The whole label pattern is repeated with + so you can have multiple labels (mail.example.co.uk).
  • [a-zA-Z]{2,63} is the TLD: letters only, between 2 and 63 characters. Modern TLDs include .museum, .london, .amazon.

This pattern rejects all-numeric TLDs (which don't exist in real DNS) and localhost (single-label). To accept single-label hostnames for internal use, see the variant in the Quick reference above.

The strict pattern (RFC 1035 length limits)

RFC 1035 imposes two length rules: each label is at most 63 octets, and the entire domain (including dots) is at most 255 octets. Regex can enforce the per-label limit naturally; the total length is easier to check with a lookahead.

code
^(?=.{1,255}$)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$

The added piece is (?=.{1,255}$), a lookahead that asserts the entire string is between 1 and 255 characters. Combined with the rest, this enforces both per-label and total-length limits.

For more on lookahead patterns like this, see how to use regex lookaheads and lookbehinds.

Subdomains and depth control

If you want to limit how deep the subdomain tree can go (for example, allow only something.example.com, not a.b.c.example.com):

code
Exactly one subdomain:        ^[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.[a-zA-Z]{2,63}$
Up to 3 subdomains:           ^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.){1,3}[a-zA-Z]{2,63}$
Just root domain (no subs):   ^[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.[a-zA-Z]{2,63}$

The {1,3} quantifier on the label group controls how many "name." segments precede the TLD. Use this when your application has a specific subdomain policy.

Internationalised domain names (IDN / punycode)

A domain like 例え.テスト is internationalised. DNS itself does not speak Unicode. Internationalised domains get encoded as punycode (xn--r8jz45g.xn--zckzah) for the wire format. The good news: punycode is ASCII-safe and the same regex matches it.

code
^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$

Try it against xn--r8jz45g.xn--zckzah. It matches because punycode uses only a-z, 0-9, and hyphen.

If you need to accept the Unicode form directly in user input (and convert to punycode later), Python provides idna.encode(), Node has url.domainToASCII(), and PHP has idn_to_ascii(). Run the input through the IDN encoder first, then validate the punycode result with the regex.

Examples in JavaScript, Python, and PHP

JavaScript:

javascript
const domainPattern = /^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$/;

function isValidDomain(input) {
  if (input.length > 255) return false;
  return domainPattern.test(input);
}

isValidDomain("example.com");              // true
isValidDomain("mail.example.co.uk");       // true
isValidDomain("xn--r8jz45g.xn--zckzah");   // true (punycode IDN)
isValidDomain("-example.com");             // false (leading hyphen)
isValidDomain("example");                  // false (no TLD)

// For Unicode input, encode first
const punycode = require("url").domainToASCII("例え.テスト");
isValidDomain(punycode);                   // true

Python:

python
import re

DOMAIN_RE = re.compile(
    r"^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$"
)

def is_valid_domain(value: str) -> bool:
    if len(value) > 255:
        return False
    return bool(DOMAIN_RE.match(value))

is_valid_domain("example.com")        # True
is_valid_domain("mail.sub.co.uk")     # True
is_valid_domain("example..com")       # False (empty label)

# For Unicode input, use the idna package
import idna
punycode = idna.encode("例え.テスト").decode("ascii")
is_valid_domain(punycode)             # True

PHP:

php
function isValidDomain(string $value): bool {
    if (strlen($value) > 255) return false;
    $pattern = '/^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$/';
    return (bool) preg_match($pattern, $value);
}

isValidDomain("example.com");          // true
isValidDomain("xn--r8jz45g.xn--zckzah"); // true

// For Unicode input
$punycode = idn_to_ascii("例え.テスト", IDNA_DEFAULT, INTL_IDNA_VARIANT_UTS46);
isValidDomain($punycode);              // true

Engine compatibility

The strict-length variant uses a lookahead (?=.{1,255}$). Most engines support it; a few do not. The practical pattern is lookahead-free and runs everywhere.

EnginePractical patternStrict (lookahead)IDN encoder in stdlib
JavaScriptWorksWorksurl.domainToASCII (Node), or the punycode package
Python (re)WorksWorksencodings.idna or third-party idna
PHP (PCRE)WorksWorksidn_to_ascii (intl extension)
JavaWorksWorksjava.net.IDN.toASCII
.NETWorksWorksSystem.Globalization.IdnMapping
Go (RE2)WorksNot supportedgolang.org/x/net/idna
Rust (regex crate)WorksNot supportedidna crate
RubyWorksWorksAddressable::URI (gem)
POSIX ERE (grep -E)Works (no \d/\w)Not supportedNone

For Go and Rust, where lookahead is unavailable, enforce the total 255-character length in code as a separate check after the regex passes (see the JavaScript example above).

When to skip regex and use a DNS check

A regex confirms the string is shaped like a domain. It does not confirm the domain exists, resolves, or points anywhere. For applications where that matters (webhook URLs, OAuth callbacks, transactional email domains), pair the regex with a DNS lookup.

JavaScript (Node):

javascript
const dns = require("node:dns").promises;

async function domainExists(domain) {
  if (!isValidDomain(domain)) return false;
  try {
    await dns.resolve(domain);
    return true;
  } catch {
    return false;
  }
}

Python:

python
import socket

def domain_exists(domain: str) -> bool:
    if not is_valid_domain(domain):
        return False
    try:
        socket.gethostbyname(domain)
        return True
    except socket.gaierror:
        return False

For deeper checks (does this domain have valid MX records, DNSSEC, working SPF and DKIM, a current SSL certificate), the DNS Inspector at dnschkr.com runs 25+ automated tests against a domain in one request and returns a health score. It's also what I cover in detail in the DNS health check walkthrough on this site.

Common mistakes

The bugs I see most often, with the fix for each.

Allowing labels to start or end with a hyphen. A naive [a-zA-Z0-9-]+ accepts -bad.com and bad-.com. The practical pattern in this article uses a first-character class without -, a middle class with -, and a last-character class without -, which enforces the rule.

Accepting all-numeric labels in the TLD position. 123.456 parses as "shape OK" but no real TLD is numeric. The pattern uses [a-zA-Z]{2,63} for the TLD, which excludes digits.

Capping the TLD at 4 or 6 characters. Old patterns used [a-zA-Z]{2,4} which rejects .museum, .amazon, .travel, and .london. Use {2,63} to match the RFC limit.

Forgetting punycode is the wire format. Unicode domains (例え.テスト) never appear on the DNS query. They're encoded to xn--... before the lookup. Validate the encoded form, or run the IDN encoder first and then validate.

Trying to match the leading-zero IPv4-as-domain case. 192.168.1.1 matches the practical domain pattern because each octet is alphanumeric. If your input might be an IP literal, route it through the IP regex first (see IPv4 and IPv6 matching).

Treating a regex pass as proof the domain resolves. It doesn't. Pair the regex with a DNS lookup whenever "is this real" matters.

Test cases

InputPractical patternRFC 1035 strict
example.comMatchMatch
mail.example.co.ukMatchMatch
sub.sub.sub.example.comMatchMatch
xn--r8jz45g.xn--zckzahMatchMatch
example-domain.ioMatchMatch
123abc.comMatchMatch
-example.comNo matchNo match
example-.comNo matchNo match
example..comNo matchNo match
exampleNo matchNo match
example.cNo matchNo match
a.bNo matchNo match
localhostNo matchNo match

FAQ

See also

External references: RFC 1035 section 2.3.1 defines the domain name length rules. The dnschkr.com DNS Inspector runs DNS health tests against any domain to verify it actually resolves and is configured correctly. Test the regex interactively at regex101.com.

TagsRegexDomain NameDNSRegular ExpressionsJavaScriptPythonPHPValidation
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years across software, Linux systems, DevOps, and infrastructure — and a more recent focus on AI. Currently Chief Technology Officer at a tech startup in the healthcare space.

Keep reading

Related posts

Match an email address with regex. Practical pattern, strict RFC 5321 pattern, JavaScript / Python / PHP examples, edge cases, engine compatibility, common mistakes, and a test table.

How to Match an Email Address with Regex

Match an email address with regex. The practical pattern, the strict RFC 5321 pattern, examples in JavaScript, Python, and PHP, edge cases, engine compatibility, common mistakes, and a validation test table.

Match integers, decimals, signed, scientific, thousands-separated, currency, and percent numbers with regex. JavaScript / Python / PHP examples, engine notes, common mistakes, test table.

How to Match Numbers with Regex

Match integers, decimals, signed, scientific, thousands-separated, currency, and percent numbers with regex. JavaScript / Python / PHP examples, engine notes, common mistakes, test table.

Match a URL with regex. http/https schemes, protocol-relative URLs, ports, paths, query strings, fragments. JavaScript / Python / PHP examples, engine notes, parser alternative, common mistakes, test table.

How to Match a URL with Regex

Match a URL with regex. Covers http/https schemes, protocol-relative URLs, ports, paths, query strings, fragments, runnable JavaScript / Python / PHP, engine notes, and the URL parser alternative.