Match a URL with Regex: Practical and Strict Patterns (2026)

The practical regex for matching a URL: ^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$. It accepts http://, https://, or no protocol at all (protocol-relative URLs like example.com/path), a domain with at least one dot, an optional path, query string, and fragment. There is also a stricter form that requires the protocol and validates port numbers, and there is the modern alternative that skips regex entirely and uses the language's built-in URL parser. Below I walk all three, with runnable code in JavaScript, Python, and PHP, engine-specific notes, and the bugs I've seen most often.

The reason "match a URL" has so many regex variants is that the URL standard (RFC 3986) permits a lot of esoteric forms. The practical pattern matches the URLs that real users type and real APIs return; the strict pattern follows the spec more closely. Decide based on what the input is for.

Quick reference

The practical pattern, ready to paste:

code

^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$

Strict pattern (protocol required, optional port):

code

^(https?):\/\/([\w-]+(\.[\w-]+)+)(:[0-9]{1,5})?(\/[^\s?#]*)?(\?[^\s#]*)?(#\S*)?$

HTTPS only (the one I use in production for webhook URLs):

code

^https:\/\/([\w-]+(\.[\w-]+)+)(\/[^\s?#]*)?(\?[^\s#]*)?(#\S*)?$

The practical pattern

code

^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$

Left to right:

^ and $ anchor to the full string.
(https?:\/\/)? is an optional protocol. https? matches both http and https. The whole group is optional.
([\w-]+(\.[\w-]+)+) is the domain: one or more "word" characters (letters/digits/underscore) plus hyphens, repeated with dots between segments. Requires at least one dot.
([\/\w \.-]*)*\/? is the optional path.
(\?[^\s#]*)? is the optional query string (everything from ? to a # or whitespace).
(#[^\s]*)? is the optional fragment (everything from # to whitespace).

This pattern accepts https://example.com, example.com/path, sub.example.com/path?query=1#section, and most things in between.

The strict pattern (with protocol and port)

If you want to require the protocol and explicitly handle ports:

code

^(https?):\/\/([\w-]+(\.[\w-]+)+)(:[0-9]{1,5})?(\/[^\s?#]*)?(\?[^\s#]*)?(#\S*)?$

The differences:

(https?) is required (no ? after the group).
(:[0-9]{1,5})? is an optional port between 1 and 99999.
The path uses [^\s?#]* so it stops at the first space, ?, or #.

Use this when the URL is coming from a trusted source and you want to reject obviously-broken inputs like htttps://example.com (note the triple t).

Examples in JavaScript, Python, and PHP

JavaScript:

javascript

const urlPattern = /^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$/;
function isValidUrl(input) {
  return urlPattern.test(input);
}
isValidUrl("https://example.com");           // true
isValidUrl("example.com/path?q=1");          // true
isValidUrl("ftp://example.com");             // false (not http/https)

Python:

python

import re
URL_RE = re.compile(
    r"^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$"
)

def is_valid_url(value: str) -> bool:
    return bool(URL_RE.match(value))

is_valid_url("https://example.com/path")    # True
is_valid_url("not a url")                   # False

PHP:

php

function isValidUrl(string $value): bool {
    $pattern = '/^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$/';
    return (bool) preg_match($pattern, $value);
}

isValidUrl("https://techearl.com/regex-match-url");  // true
isValidUrl("javascript:alert(1)");                   // false

For PHP specifically, the built-in alternative is filter_var($value, FILTER_VALIDATE_URL). It is stricter than the regex above and refuses URLs without a scheme.

When to skip regex and use a URL parser instead

For anything more than "is this URL-shaped", a regex is the wrong tool. Every modern language has a URL parser that handles edge cases the regex cannot: internationalised domain names (例え.テスト), userinfo (user:pass@host), IPv6 literals (http://[::1]/), percent-encoding, all of it.

JavaScript:

javascript

function isValidUrl(input) {
  try {
    new URL(input);
    return true;
  } catch {
    return false;
  }
}

Python:

python

from urllib.parse import urlparse

def is_valid_url(value: str) -> bool:
    try:
        result = urlparse(value)
        return all([result.scheme, result.netloc])
    except Exception:
        return False

PHP:

php

function isValidUrl(string $value): bool {
    return filter_var($value, FILTER_VALIDATE_URL) !== false;
}

The trade-off: parsers are more correct but slower than regex. For high-volume input validation (form fields on a busy site, log scanning), regex wins. For "is this safe to redirect to?", use the parser and inspect specific fields (scheme, host, port).

Engine compatibility

The practical and strict patterns use only universal features (anchors, character classes, quantifiers, alternation). They run unmodified everywhere. The per-engine notes are about the parser fallback you reach for when you need correctness over speed.

Engine	Parser equivalent	Per-engine note
JavaScript	`new URL(input)`	Throws on invalid input; wrap in try/catch. Supports IDN and IPv6 literals out of the box.
Python	`urllib.parse.urlparse`	Returns a struct even for non-URL input; check `scheme` and `netloc` are non-empty.
PHP (PCRE)	`filter_var($v, FILTER_VALIDATE_URL)`	Follows RFC 2396 (the older spec). Rejects IDN without `idn_to_ascii` preprocessing.
Java	`java.net.URI(s).toURL()`	`URI` parses, `toURL()` enforces a known scheme.
.NET	`Uri.TryCreate(s, UriKind.Absolute, out _)`	The recommended cross-version approach.
Go (RE2)	`net/url.Parse`	Returns no error for partial URLs; check `u.Scheme` and `u.Host` explicitly. RE2 lacks lookahead so any pattern with `(?=...)` needs rewriting.
Rust (`regex` crate)	`url::Url::parse` (url crate)	No lookahead, no backreferences. Stick to the practical pattern.
Ruby	`URI.parse(s)`	Raises on invalid; rescue `URI::InvalidURIError`.

For cross-language form validation where the same pattern runs on the frontend and the backend, keep to the practical form. Anything richer should defer to the language's URL parser.

Common mistakes

The bugs I see most often.

Allowing any scheme without thinking. A pattern like ^[a-z]+:\/\/ matches javascript:, data:, file:, and vbscript: too. Always restrict the scheme to the ones you actually want (https? for web URLs, or just https for security-sensitive contexts).

Forgetting the second anchor. ^https?:\/\/[\w-]+ accepts https://exampleEXTRA_GARBAGE_HERE because nothing pins the end. Anchor both sides for validation.

Treating regex-validated URLs as safe to redirect to. A URL can be "shaped right" and still point at an attacker-controlled host. For open-redirect prevention, parse the URL and inspect the host against an allow-list.

Not allowing the protocol-relative form when you should. Patterns that force https?:\/\/ reject //cdn.example.com/file.js, which is legal in HTML and common in CDN configs. Decide whether to accept this case and adjust.

Storing the raw input instead of the parsed form. Two URLs that resolve to the same resource (HTTPS://Example.com/Path and https://example.com/Path) compare unequal as strings. Always normalise via the URL parser before storing or comparing.

Trusting the path part to be free of HTML. A URL like https://example.com/<script> is valid as a URL but unsafe to render unescaped. The regex validates the shape; HTML-escape on output regardless.

Test cases: matches and non-matches

Input	Practical pattern	Notes
`https://example.com`	Match	Standard
`http://example.com`	Match	Standard
`example.com/path`	Match	Protocol-relative
`example.com`	Match	Just domain
`https://example.com/path?q=1&p=2#anchor`	Match	Full URL
`https://sub.example.co.uk:8080/path`	Match (strict only)	Port
`htp://example.com`	No match	Wrong scheme
`https://`	No match	Domain required
`https://example`	No match	No TLD
`javascript:alert(1)`	No match	Not a URL scheme we accept

FAQ

Use a URL parser (new URL() in JavaScript, urlparse in Python, filter_var in PHP) when correctness matters. For example, when deciding whether to redirect a user to the URL, or storing it in a database.

Use regex when speed matters more than handling every edge of the URL spec, or when you need to enforce something the parser doesn't (only HTTPS, only specific domains, no userinfo).

If your pattern uses [a-z]+:\/\/ without restricting the scheme, it will match any scheme. The practical pattern in this article uses https?:\/\/ which only allows http and https.

Other dangerous schemes to explicitly reject in user input: javascript:, data:, vbscript:, file:. Always inspect the scheme; never blindly redirect to a user-provided URL.

No. The pattern uses [\w-] for domain characters, which is ASCII letters/digits/underscore plus hyphen. Internationalised domains like 例え.テスト use Unicode and would be encoded as Punycode (xn--r8jz45g.xn--zckzah) for DNS purposes.

If you need to accept internationalised domains, use a URL parser instead. The parser normalises internationalised characters to Punycode for you.

Replace https? with https in the pattern: ^https:\/\/([\w-]+(\.[\w-]+)+).... The s is no longer optional, so http:// URLs fail to match.

This is the pattern to use when you want to enforce TLS on user-submitted links (webhooks, OAuth callbacks, payment-success URLs).

PHP's FILTER_VALIDATE_URL follows RFC 2396 (the older URL spec) and rejects URLs with internationalised domains or some Unicode characters even after percent-encoding. It also requires a scheme by default.

Be aware that FILTER_FLAG_PATH_REQUIRED tightens validation rather than relaxing it: it forces the URL to include a path component, so http://example.com fails while http://example.com/ passes. For a more permissive check, drop the flags entirely and fall back to a regex like the one in this article or a dedicated URL parser.

Capture the host portion in a group: ^https?:\/\/([^\/\s:?#]+). After matching, the host is in group 1. This handles ports correctly by stopping at the first :.

For anything more involved (extracting userinfo, ports, paths separately), use the URL parser. See the domain-matching guide for the standalone domain pattern.

Yes. The practical pattern includes (\?[^\s#]*)? for the optional query string and (#[^\s]*)? for the optional fragment. Both stop at whitespace; the query also stops at # so the fragment can take over.

What it does not do is validate the query-string structure (key=value pairs, percent-encoding). For that, parse the URL and use the parser's query iterator.

How to Match a URL with Regex

Quick reference

The practical pattern

The strict pattern (with protocol and port)

Examples in JavaScript, Python, and PHP

When to skip regex and use a URL parser instead

Engine compatibility

Common mistakes

Test cases: matches and non-matches

FAQ

See also

Sources

Ishan Karunaratne

Related posts

How to Match Numbers with Regex

How to Match a Domain Name with Regex

How to Match an Email Address with Regex

Should I validate URLs with regex or the built-in URL parser?

Why does my URL regex match invalid schemes like ftp:// or javascript:?

Does the URL pattern handle internationalised domains?

How do I match only HTTPS URLs?

Why doesn't filter_var accept some URLs that look valid?

How do I extract the domain from a URL with regex?

Does the pattern handle URLs with query strings and fragments?

Sources

Ishan Karunaratne