The practical regex for matching a URL: ^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$. It accepts http://, https://, or no protocol at all (protocol-relative URLs like example.com/path), a domain with at least one dot, an optional path, query string, and fragment. There is also a stricter form that requires the protocol and validates port numbers, and there is the modern alternative that skips regex entirely and uses the language's built-in URL parser. Below I walk all three, with runnable code in JavaScript, Python, and PHP, engine-specific notes, and the bugs I've seen most often.
The reason "match a URL" has so many regex variants is that the URL standard (RFC 3986) permits a lot of esoteric forms. The practical pattern matches the URLs that real users type and real APIs return; the strict pattern follows the spec more closely. Decide based on what the input is for.
Quick reference
The practical pattern, ready to paste:
^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$
Strict pattern (protocol required, optional port):
^(https?):\/\/([\w-]+(\.[\w-]+)+)(:[0-9]{1,5})?(\/[^\s?#]*)?(\?[^\s#]*)?(#\S*)?$
HTTPS only (the one I use in production for webhook URLs):
^https:\/\/([\w-]+(\.[\w-]+)+)(\/[^\s?#]*)?(\?[^\s#]*)?(#\S*)?$
The practical pattern
^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$
Left to right:
^and$anchor to the full string.(https?:\/\/)?is an optional protocol.https?matches bothhttpandhttps. The whole group is optional.([\w-]+(\.[\w-]+)+)is the domain: one or more "word" characters (letters/digits/underscore) plus hyphens, repeated with dots between segments. Requires at least one dot.([\/\w \.-]*)*\/?is the optional path.(\?[^\s#]*)?is the optional query string (everything from?to a#or whitespace).(#[^\s]*)?is the optional fragment (everything from#to whitespace).
This pattern accepts https://example.com, example.com/path, sub.example.com/path?query=1#section, and most things in between.
The strict pattern (with protocol and port)
If you want to require the protocol and explicitly handle ports:
^(https?):\/\/([\w-]+(\.[\w-]+)+)(:[0-9]{1,5})?(\/[^\s?#]*)?(\?[^\s#]*)?(#\S*)?$
The differences:
(https?)is required (no?after the group).(:[0-9]{1,5})?is an optional port between 1 and 99999.- The path uses
[^\s?#]*so it stops at the first space,?, or#.
Use this when the URL is coming from a trusted source and you want to reject obviously-broken inputs like htttps://example.com (note the triple t).
Examples in JavaScript, Python, and PHP
JavaScript:
const urlPattern = /^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$/;
function isValidUrl(input) {
return urlPattern.test(input);
}
isValidUrl("https://example.com"); // true
isValidUrl("example.com/path?q=1"); // true
isValidUrl("ftp://example.com"); // false (not http/https)Python:
import re
URL_RE = re.compile(
r"^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$"
)
def is_valid_url(value: str) -> bool:
return bool(URL_RE.match(value))
is_valid_url("https://example.com/path") # True
is_valid_url("not a url") # FalsePHP:
function isValidUrl(string $value): bool {
$pattern = '/^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$/';
return (bool) preg_match($pattern, $value);
}
isValidUrl("https://techearl.com/regex-match-url"); // true
isValidUrl("javascript:alert(1)"); // falseFor PHP specifically, the built-in alternative is filter_var($value, FILTER_VALIDATE_URL). It is stricter than the regex above and refuses URLs without a scheme.
When to skip regex and use a URL parser instead
For anything more than "is this URL-shaped", a regex is the wrong tool. Every modern language has a URL parser that handles edge cases the regex cannot: internationalised domain names (例え.テスト), userinfo (user:pass@host), IPv6 literals (http://[::1]/), percent-encoding, all of it.
JavaScript:
function isValidUrl(input) {
try {
new URL(input);
return true;
} catch {
return false;
}
}Python:
from urllib.parse import urlparse
def is_valid_url(value: str) -> bool:
try:
result = urlparse(value)
return all([result.scheme, result.netloc])
except Exception:
return FalsePHP:
function isValidUrl(string $value): bool {
return filter_var($value, FILTER_VALIDATE_URL) !== false;
}The trade-off: parsers are more correct but slower than regex. For high-volume input validation (form fields on a busy site, log scanning), regex wins. For "is this safe to redirect to?", use the parser and inspect specific fields (scheme, host, port).
Engine compatibility
The practical and strict patterns use only universal features (anchors, character classes, quantifiers, alternation). They run unmodified everywhere. The per-engine notes are about the parser fallback you reach for when you need correctness over speed.
| Engine | Parser equivalent | Per-engine note |
|---|---|---|
| JavaScript | new URL(input) | Throws on invalid input; wrap in try/catch. Supports IDN and IPv6 literals out of the box. |
| Python | urllib.parse.urlparse | Returns a struct even for non-URL input; check scheme and netloc are non-empty. |
| PHP (PCRE) | filter_var($v, FILTER_VALIDATE_URL) | Follows RFC 2396 (the older spec). Rejects IDN without idn_to_ascii preprocessing. |
| Java | java.net.URI(s).toURL() | URI parses, toURL() enforces a known scheme. |
| .NET | Uri.TryCreate(s, UriKind.Absolute, out _) | The recommended cross-version approach. |
| Go (RE2) | net/url.Parse | Returns no error for partial URLs; check u.Scheme and u.Host explicitly. RE2 lacks lookahead so any pattern with (?=...) needs rewriting. |
Rust (regex crate) | url::Url::parse (url crate) | No lookahead, no backreferences. Stick to the practical pattern. |
| Ruby | URI.parse(s) | Raises on invalid; rescue URI::InvalidURIError. |
For cross-language form validation where the same pattern runs on the frontend and the backend, keep to the practical form. Anything richer should defer to the language's URL parser.
Common mistakes
The bugs I see most often.
Allowing any scheme without thinking. A pattern like ^[a-z]+:\/\/ matches javascript:, data:, file:, and vbscript: too. Always restrict the scheme to the ones you actually want (https? for web URLs, or just https for security-sensitive contexts).
Forgetting the second anchor. ^https?:\/\/[\w-]+ accepts https://exampleEXTRA_GARBAGE_HERE because nothing pins the end. Anchor both sides for validation.
Treating regex-validated URLs as safe to redirect to. A URL can be "shaped right" and still point at an attacker-controlled host. For open-redirect prevention, parse the URL and inspect the host against an allow-list.
Not allowing the protocol-relative form when you should. Patterns that force https?:\/\/ reject //cdn.example.com/file.js, which is legal in HTML and common in CDN configs. Decide whether to accept this case and adjust.
Storing the raw input instead of the parsed form. Two URLs that resolve to the same resource (HTTPS://Example.com/Path and https://example.com/Path) compare unequal as strings. Always normalise via the URL parser before storing or comparing.
Trusting the path part to be free of HTML. A URL like https://example.com/<script> is valid as a URL but unsafe to render unescaped. The regex validates the shape; HTML-escape on output regardless.
Test cases: matches and non-matches
| Input | Practical pattern | Notes |
|---|---|---|
https://example.com | Match | Standard |
http://example.com | Match | Standard |
example.com/path | Match | Protocol-relative |
example.com | Match | Just domain |
https://example.com/path?q=1&p=2#anchor | Match | Full URL |
https://sub.example.co.uk:8080/path | Match (strict only) | Port |
htp://example.com | No match | Wrong scheme |
https:// | No match | Domain required |
https://example | No match | No TLD |
javascript:alert(1) | No match | Not a URL scheme we accept |
FAQ
Use a URL parser (new URL() in JavaScript, urlparse in Python, filter_var in PHP) when correctness matters. For example, when deciding whether to redirect a user to the URL, or storing it in a database.
Use regex when speed matters more than handling every edge of the URL spec, or when you need to enforce something the parser doesn't (only HTTPS, only specific domains, no userinfo).
If your pattern uses [a-z]+:\/\/ without restricting the scheme, it will match any scheme. The practical pattern in this article uses https?:\/\/ which only allows http and https.
Other dangerous schemes to explicitly reject in user input: javascript:, data:, vbscript:, file:. Always inspect the scheme; never blindly redirect to a user-provided URL.
No. The pattern uses [\w-] for domain characters, which is ASCII letters/digits/underscore plus hyphen. Internationalised domains like 例え.テスト use Unicode and would be encoded as Punycode (xn--r8jz45g.xn--zckzah) for DNS purposes.
If you need to accept internationalised domains, use a URL parser instead. The parser normalises internationalised characters to Punycode for you.
Replace https? with https in the pattern: ^https:\/\/([\w-]+(\.[\w-]+)+).... The s is no longer optional, so http:// URLs fail to match.
This is the pattern to use when you want to enforce TLS on user-submitted links (webhooks, OAuth callbacks, payment-success URLs).
PHP's FILTER_VALIDATE_URL follows RFC 2396 (the older URL spec) and rejects URLs with internationalised domains or some Unicode characters even after percent-encoding. It also requires a scheme by default.
Be aware that FILTER_FLAG_PATH_REQUIRED tightens validation rather than relaxing it: it forces the URL to include a path component, so http://example.com fails while http://example.com/ passes. For a more permissive check, drop the flags entirely and fall back to a regex like the one in this article or a dedicated URL parser.
Capture the host portion in a group: ^https?:\/\/([^\/\s:?#]+). After matching, the host is in group 1. This handles ports correctly by stopping at the first :.
For anything more involved (extracting userinfo, ports, paths separately), use the URL parser. See the domain-matching guide for the standalone domain pattern.
Yes. The practical pattern includes (\?[^\s#]*)? for the optional query string and (#[^\s]*)? for the optional fragment. Both stop at whitespace; the query also stops at # so the fragment can take over.
What it does not do is validate the query-string structure (key=value pairs, percent-encoding). For that, parse the URL and use the parser's query iterator.
See also
- How to Match an Email Address with Regex: the cousin pattern on every signup form
- How to Match a Date with Regex (multiple formats): another structured-string validator, with the same "regex checks the shape, not the validity" caveat
- How to Match a Domain Name with Regex: the standalone domain pattern when you want just the host
- How to Match an IPv4 and IPv6 Address with Regex: for URLs with IP-literal hosts like
http://[2001:db8::1]/ - Regex Anchors: why
^and$matter so much for URL validation - Regex Lookaheads and Lookbehinds: composing conditional URL constraints
- Regex Capturing Groups and Backreferences: pull out the host, path, and query separately
- How to Use Regex in .htaccess: matching and rewriting these URLs in Apache mod_rewrite
- How to Use Regex in Nginx: matching and rewriting these URLs in Nginx location and rewrite
- Regex Cheat Sheet: the wider syntax and engine compatibility reference
External reference: try the pattern interactively at regex101.com and see every token explained. For URL parsing details, the WHATWG URL Standard is the modern spec implemented by browsers.
Sources
Authoritative references this article was fact-checked against.





