The practical regex for matching a URL: ^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$. It accepts http://, https://, or no protocol at all (protocol-relative URLs like example.com/path), a domain with at least one dot, an optional path, query string, and fragment. There is also a stricter form that requires the protocol and validates port numbers, and there is the modern alternative that skips regex entirely and uses the language's built-in URL parser. Below I walk all three, with runnable code in JavaScript, Python, and PHP, engine-specific notes, and the bugs I've seen most often.
The reason "match a URL" has so many regex variants is that the URL standard (RFC 3986) permits a lot of esoteric forms. The practical pattern matches the URLs that real users type and real APIs return; the strict pattern follows the spec more closely. Decide based on what the input is for.
Quick reference
The practical pattern, ready to paste:
^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$
Strict pattern (protocol required, optional port):
^(https?):\/\/([\w-]+(\.[\w-]+)+)(:[0-9]{1,5})?(\/[^\s?#]*)?(\?[^\s#]*)?(#\S*)?$
HTTPS only (the one I use in production for webhook URLs):
^https:\/\/([\w-]+(\.[\w-]+)+)(\/[^\s?#]*)?(\?[^\s#]*)?(#\S*)?$
The practical pattern
^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$
Left to right:
^and$anchor to the full string.(https?:\/\/)?is an optional protocol.https?matches bothhttpandhttps. The whole group is optional.([\w-]+(\.[\w-]+)+)is the domain: one or more "word" characters (letters/digits/underscore) plus hyphens, repeated with dots between segments. Requires at least one dot.([\/\w \.-]*)*\/?is the optional path.(\?[^\s#]*)?is the optional query string (everything from?to a#or whitespace).(#[^\s]*)?is the optional fragment (everything from#to whitespace).
This pattern accepts https://example.com, example.com/path, sub.example.com/path?query=1#section, and most things in between.
The strict pattern (with protocol and port)
If you want to require the protocol and explicitly handle ports:
^(https?):\/\/([\w-]+(\.[\w-]+)+)(:[0-9]{1,5})?(\/[^\s?#]*)?(\?[^\s#]*)?(#\S*)?$
The differences:
(https?)is required (no?after the group).(:[0-9]{1,5})?is an optional port between 1 and 99999.- The path uses
[^\s?#]*so it stops at the first space,?, or#.
Use this when the URL is coming from a trusted source and you want to reject obviously-broken inputs like htttps://example.com (note the triple t).
Examples in JavaScript, Python, and PHP
JavaScript:
const urlPattern = /^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$/;
function isValidUrl(input) {
return urlPattern.test(input);
}
isValidUrl("https://example.com"); // true
isValidUrl("example.com/path?q=1"); // true
isValidUrl("ftp://example.com"); // false (not http/https)Python:
import re
URL_RE = re.compile(
r"^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$"
)
def is_valid_url(value: str) -> bool:
return bool(URL_RE.match(value))
is_valid_url("https://example.com/path") # True
is_valid_url("not a url") # FalsePHP:
function isValidUrl(string $value): bool {
$pattern = '/^(https?:\/\/)?([\w-]+(\.[\w-]+)+)([\/\w \.-]*)*\/?(\?[^\s#]*)?(#[^\s]*)?$/';
return (bool) preg_match($pattern, $value);
}
isValidUrl("https://techearl.com/regex-match-url"); // true
isValidUrl("javascript:alert(1)"); // falseFor PHP specifically, the built-in alternative is filter_var($value, FILTER_VALIDATE_URL). It is stricter than the regex above and refuses URLs without a scheme.
When to skip regex and use a URL parser instead
For anything more than "is this URL-shaped", a regex is the wrong tool. Every modern language has a URL parser that handles edge cases the regex cannot: internationalised domain names (例え.テスト), userinfo (user:pass@host), IPv6 literals (http://[::1]/), percent-encoding, all of it.
JavaScript:
function isValidUrl(input) {
try {
new URL(input);
return true;
} catch {
return false;
}
}Python:
from urllib.parse import urlparse
def is_valid_url(value: str) -> bool:
try:
result = urlparse(value)
return all([result.scheme, result.netloc])
except Exception:
return FalsePHP:
function isValidUrl(string $value): bool {
return filter_var($value, FILTER_VALIDATE_URL) !== false;
}The trade-off: parsers are more correct but slower than regex. For high-volume input validation (form fields on a busy site, log scanning), regex wins. For "is this safe to redirect to?", use the parser and inspect specific fields (scheme, host, port).
Engine compatibility
The practical and strict patterns use only universal features (anchors, character classes, quantifiers, alternation). They run unmodified everywhere. The per-engine notes are about the parser fallback you reach for when you need correctness over speed.
| Engine | Parser equivalent | Per-engine note |
|---|---|---|
| JavaScript | new URL(input) | Throws on invalid input; wrap in try/catch. Supports IDN and IPv6 literals out of the box. |
| Python | urllib.parse.urlparse | Returns a struct even for non-URL input; check scheme and netloc are non-empty. |
| PHP (PCRE) | filter_var($v, FILTER_VALIDATE_URL) | Follows RFC 2396 (the older spec). Rejects IDN without idn_to_ascii preprocessing. |
| Java | java.net.URI(s).toURL() | URI parses, toURL() enforces a known scheme. |
| .NET | Uri.TryCreate(s, UriKind.Absolute, out _) | The recommended cross-version approach. |
| Go (RE2) | net/url.Parse | Returns no error for partial URLs; check u.Scheme and u.Host explicitly. RE2 lacks lookahead so any pattern with (?=...) needs rewriting. |
Rust (regex crate) | url::Url::parse (url crate) | No lookahead, no backreferences. Stick to the practical pattern. |
| Ruby | URI.parse(s) | Raises on invalid; rescue URI::InvalidURIError. |
For cross-language form validation where the same pattern runs on the frontend and the backend, keep to the practical form. Anything richer should defer to the language's URL parser.
Common mistakes
The bugs I see most often.
Allowing any scheme without thinking. A pattern like ^[a-z]+:\/\/ matches javascript:, data:, file:, and vbscript: too. Always restrict the scheme to the ones you actually want (https? for web URLs, or just https for security-sensitive contexts).
Forgetting the second anchor. ^https?:\/\/[\w-]+ accepts https://exampleEXTRA_GARBAGE_HERE because nothing pins the end. Anchor both sides for validation.
Treating regex-validated URLs as safe to redirect to. A URL can be "shaped right" and still point at an attacker-controlled host. For open-redirect prevention, parse the URL and inspect the host against an allow-list.
Not allowing the protocol-relative form when you should. Patterns that force https?:\/\/ reject //cdn.example.com/file.js, which is legal in HTML and common in CDN configs. Decide whether to accept this case and adjust.
Storing the raw input instead of the parsed form. Two URLs that resolve to the same resource (HTTPS://Example.com/Path and https://example.com/Path) compare unequal as strings. Always normalise via the URL parser before storing or comparing.
Trusting the path part to be free of HTML. A URL like https://example.com/<script> is valid as a URL but unsafe to render unescaped. The regex validates the shape; HTML-escape on output regardless.
Test cases: matches and non-matches
| Input | Practical pattern | Notes |
|---|---|---|
https://example.com | Match | Standard |
http://example.com | Match | Standard |
example.com/path | Match | Protocol-relative |
example.com | Match | Just domain |
https://example.com/path?q=1&p=2#anchor | Match | Full URL |
https://sub.example.co.uk:8080/path | Match (strict only) | Port |
htp://example.com | No match | Wrong scheme |
https:// | No match | Domain required |
https://example | No match | No TLD |
javascript:alert(1) | No match | Not a URL scheme we accept |
FAQ
See also
- How to Match an Email Address with Regex: the cousin pattern on every signup form
- How to Match a Domain Name with Regex: the standalone domain pattern when you want just the host
- How to Match an IPv4 and IPv6 Address with Regex: for URLs with IP-literal hosts like
http://[2001:db8::1]/ - Regex Anchors: why
^and$matter so much for URL validation - Regex Lookaheads and Lookbehinds: composing conditional URL constraints
- Regex Capturing Groups and Backreferences: pull out the host, path, and query separately
- Regex Cheat Sheet: the wider syntax and engine compatibility reference
External reference: try the pattern interactively at regex101.com and see every token explained. For URL parsing details, the WHATWG URL Standard is the modern spec implemented by browsers.





