WordPress: Moderate Comments Using Regular Expressions (2026)

WordPress ships two regex-aware textareas in Settings → Discussion ("Comment Moderation" and "Disallowed Comment Keys") plus a PHP filter, pre_comment_approved, that lets you run arbitrary PCRE against every incoming comment before it hits the database. Used together they cover most non-AI spam without an external service.

How do I moderate WordPress comments with regular expressions?

WordPress has two built-in regex-friendly fields under Settings → Discussion:

Comment Moderation, any comment matching a line here is held in the moderation queue.
Disallowed Comment Keys (formerly "Comment Blacklist" until WordPress 5.5), matches are sent straight to spam.

Each line is treated as a case-insensitive substring match by wp_check_comment_disallowed_list(), but because that function runs each line through preg_match() you can use full PCRE syntax: \b word boundaries, character classes, lookaheads, alternation. For anything beyond list-of-strings filtering, drop a function on the pre_comment_approved filter and run preg_match() against $commentdata['comment_content'], returning 0 to hold, 'spam' to spam, 'trash' to discard, or 1 to auto-approve.

Jump to:

The two built-in regex fields
Real-world moderation patterns
The pre_comment_approved filter
Approve, hold, spam, or trash
Testing patterns before they go live
Plugin alternatives
What to do next
FAQ

The two built-in regex fields

Both live in /wp-admin/options-discussion.php:

Field	Behavior when matched	Stored in option
Comment Moderation	Comment held for manual review	`moderation_keys`
Disallowed Comment Keys	Comment goes straight to spam	`disallowed_keys`

Internally WordPress reads each textarea line-by-line, trims it, and runs the equivalent of:

php

if ( preg_match( '#' . preg_quote( $word, '#' ) . '#i', $comment_content ) ) {
    // hold or spam
}

The preg_quote() part means special regex characters in plain words ('?', '.', '+') are escaped, so adding viagra just matches the substring viagra. But the moment your line LOOKS like a regex, WordPress still wraps it in #...#i and PCRE evaluates it, anchors, classes, \b, lookaheads all work. There's no documented "regex mode toggle"; it's an emergent behavior of how preg_match() handles the input.

The fields check the comment author name, email, URL, body, IP, and user agent, all six fields, concatenated. Useful when a spammer rotates the body but reuses the same throwaway email domain.

Real-world moderation patterns

A short list of patterns I actually use, paired with what they catch.

Block common spam phrases with word boundaries

code

\b(viagra|cialis|payday loan|crypto pump)\b

\b is a word boundary, it stops analyst from matching cialis substrings and stops replayday from matching payday. Word boundaries are the single biggest accuracy upgrade you can make to a keyword list.

Limit URL count per comment

code

(https?://[^\s]+.*?){3,}

Matches any comment with three or more HTTP/HTTPS URLs. Three is the threshold I default to; legitimate comments rarely include more than two links. Drop in Comment Moderation (hold for review) rather than Disallowed Comment Keys so a real reader doesn't get silently spammed for over-citing.

Match suspicious email domains

code

@(mail\.ru|yandex\.ru|protonmail\.com|tutanota\.com)$

Anchored to $ because WordPress concatenates author fields and the email lives at a known position relative to other punctuation. This is blunt: a legitimate Proton user gets caught. I use it in Comment Moderation, never in Disallowed Comment Keys, and review the queue daily. For a proper email pattern reference see regex match email address.

Detect mass non-Latin script spam

code

[\x{0400}-\x{04FF}\x{0590}-\x{05FF}\x{0600}-\x{06FF}]{20,}

Twenty or more consecutive Cyrillic, Hebrew, or Arabic characters. This is a controversial filter. A site whose audience legitimately writes in those scripts will lose real readers to it. I only use the pattern on English-only sites and only after seeing the same script-spam vector hit me three or more times in a week. Discussion threads on this exact trade-off show up regularly in the WordPress.org support forums and on the WordPress Stack Exchange, read the counter-arguments before deploying.

Match a known link-injection pattern

code

\[url=.*?\].*?\[/url\]

BBCode-style link injection. The [url=...] syntax isn't rendered by WordPress, but spambots paste it anyway because the same payload gets reused across vBulletin / phpBB targets. Match-and-spam.

The pre_comment_approved filter

The textareas in the admin cover keyword-and-pattern matching. For anything stateful (rate limits, dynamic blocklists, IP geolocation), use the pre_comment_approved filter in code:

php

function te_check_comment( $approved, $commentdata ) {
    $content = $commentdata['comment_content'];
    $pattern = '#\b(buy now|click here|free trial)\b#i';

    if ( preg_match( $pattern, $content ) ) {
        $approved = 0; // hold for moderation
    }

    return $approved;
}
add_filter( 'pre_comment_approved', 'te_check_comment', 10, 2 );

The filter receives the approval status WordPress already computed (0, 1, 'spam', 'trash', or WP_Error) plus the full $commentdata array. Whatever you return becomes the final status.

Drop this in wp-content/mu-plugins/comment-moderation.php (must-use plugin, loads automatically, can't be deactivated by a wp-admin user) or in your theme's functions.php if the site is theme-locked.

Approve, hold, spam, or trash

The four return values map to four queue destinations. Same regex, different $approved:

Hold for moderation

php

function te_check_comment( $approved, $commentdata ) {
    if ( preg_match( '#\b(viagra|cialis)\b#i', $commentdata['comment_content'] ) ) {
        $approved = 0;
    }
    return $approved;
}
add_filter( 'pre_comment_approved', 'te_check_comment', 10, 2 );

Mark as spam

php

function te_check_comment( $approved, $commentdata ) {
    if ( preg_match( '#\b(viagra|cialis)\b#i', $commentdata['comment_content'] ) ) {
        $approved = 'spam';
    }
    return $approved;
}
add_filter( 'pre_comment_approved', 'te_check_comment', 10, 2 );

Send straight to trash

php

function te_check_comment( $approved, $commentdata ) {
    if ( preg_match( '#\b(viagra|cialis)\b#i', $commentdata['comment_content'] ) ) {
        $approved = 'trash';
    }
    return $approved;
}
add_filter( 'pre_comment_approved', 'te_check_comment', 10, 2 );

Force approve (whitelist a known-good pattern)

php

function te_check_comment( $approved, $commentdata ) {
    if ( preg_match( '#@mycompany\.com$#i', $commentdata['comment_author_email'] ) ) {
        $approved = 1;
    }
    return $approved;
}
add_filter( 'pre_comment_approved', 'te_check_comment', 10, 2 );

Useful for a corporate site where employees commenting from the company domain shouldn't get caught by Akismet false positives.

Every comment still hits wp_comments with the resolved status, so you can audit later via WP_Comment_Query even for trashed entries.

Testing patterns before they go live

Never deploy a regex to a production pre_comment_approved filter without testing it against real comment content first. The cost of a bad pattern is silent suppression of legitimate comments, readers don't email you to say "my comment never appeared", they just stop visiting.

The workflow I use:

Export the last 500 approved comments from a staging or local copy: wp comment list --status=approve --format=csv --fields=comment_ID,comment_content > approved.csv.
Paste a representative comment into regex101.com (set the flavor to PCRE / PCRE2).
Iterate the pattern until it matches the spam variants and zero legitimate samples.
Stage the pattern in a pre_comment_approved filter that logs rather than blocks: error_log( 'Would have moderated: ' . $commentdata['comment_content'] );.
Tail the log for a week. If the false-positive rate is acceptable, flip from logging to blocking.

For broader pattern reference I keep the regex cheat sheet and URL matching reference open in tabs while iterating.

Plugin alternatives

Regex moderation handles 80% of low-effort spam, but it doesn't scale to AI-generated comments that read like real prose. Tier the defense:

Tool	Catches	When to use
Built-in regex fields	Substring + keyword spam	Always, it's free and runs before plugins
`pre_comment_approved` filter	Custom logic (rate limits, geo, signed-in users)	When the regex fields aren't expressive enough
Akismet (Automattic)	Statistical / ML spam, including AI-generated	Default for any non-trivial WordPress site; free for personal use
Antispam Bee	Honeypot, BBCode, language heuristics	GDPR-friendly Akismet alternative
Cloudflare Turnstile / hCaptcha	Bot signatures, no user friction	When bots are submitting comments faster than humans could type

Akismet's hosted classification API is what catches the modern "this comment praises your article in fluent English but the URL is a casino affiliate" pattern. Regex can't beat that, but regex IS what catches the 1990s-style payload spam that still makes up most of the queue.

For administrative recovery if a bad rule locks out genuine commenters, see how to change a WordPress password for getting back into admin to disable the filter, and how to increase the PHP memory limit if a runaway regex pattern is timing out PHP on long comment bodies.

What to do next

If you're building a comment-moderation system, the regex cluster on this site covers the patterns you'll reach for most often:

Regex cheat sheet, single-page PCRE reference for quick lookups while writing moderation rules.
Regex word boundaries, \b is the single biggest accuracy upgrade for keyword lists.
Regex match email address, for filtering by author email patterns.
Regex match URL, for limiting URL counts or matching link-injection payloads.

WordPress-side, two adjacent references:

Change a WordPress password, admin recovery if a bad rule locks you out.
WordPress wp_insert_post memory deep dive, if you're importing historical comments at scale.

FAQ

Effectively yes. WordPress wraps each line as #line#i and runs preg_match() against the concatenated author + body + email + URL + IP + user agent fields.

Plain words still work because preg_quote() escapes special characters first, so viagra matches the substring. But anything that looks like a regex, \b, character classes, alternation, is evaluated as PCRE.

Comment Moderation holds matches in the moderation queue for manual review. The comment exists in wp_comments with comment_approved = 0 and shows up in /wp-admin/edit-comments.php?comment_status=moderated.

Disallowed Comment Keys sends matches straight to the spam folder with comment_approved = 'spam'. Use this for unambiguous patterns where you're confident you won't false-positive a real reader.

WordPress 5.5 (August 2020), as part of a wider effort to replace blacklist/whitelist terminology with allowlist/blocklist. The underlying option key was renamed from blacklist_keys to disallowed_keys in the same release, with backwards-compat reads from the old key.

Yes, WordPress runs the line through PHP's PCRE engine, which supports lookaheads (?=...), negative lookaheads (?!...), lookbehinds (?<=...), and negative lookbehinds (?<!...).

Watch out for the delimiter though: WordPress wraps your line in #...#i. If your pattern contains a literal #, escape it as \# or the regex will fail silently.

Two options. The simplest: mark them as a known commenter via Settings → Discussion → Before a comment appears, ticking "Comment author must have a previously approved comment". After their first approved comment, subsequent ones auto-approve.

The programmatic option: return 1 from pre_comment_approved for matches on the author email or a logged-in user role. See the "Force approve" example above.

Most common cause: missing word boundaries. cialis as a raw pattern matches specialist, commercials, provincial. Wrap with \b: \bcialis\b.

Second cause: case sensitivity. WordPress adds the i flag automatically, but your custom pre_comment_approved filter doesn't, include i in your delimiters: #pattern#i.

Third: testing the pattern only against the comment body when WordPress concatenates body + author name + email + URL + IP + user agent. Test against the full concatenated string.

Both. Regex is free, runs before any plugin, and catches the high-volume payload spam (URL flooding, BBCode injection, known phrase lists). Akismet's hosted ML catches what regex can't, coherent AI-generated comments with subtle affiliate URLs.

Order of evaluation in a typical stack: built-in regex fields → pre_comment_approved custom filter → Akismet → final database write. Each layer catches what the previous one missed.

wp-content/mu-plugins/comment-moderation.php is the right home. Must-use plugins load automatically on every request, can't be deactivated from the wp-admin Plugins screen, and survive theme switches.

Avoid functions.php for moderation logic, it ties anti-spam to the theme, so a theme switch silently disables your filters. Avoid a regular plugin slot if you're worried about accidental deactivation.

WordPress: Moderate Comments Using Regular Expressions

How do I moderate WordPress comments with regular expressions?

The two built-in regex fields

Real-world moderation patterns

Block common spam phrases with word boundaries

Limit URL count per comment

Match suspicious email domains

Detect mass non-Latin script spam

Match a known link-injection pattern

The pre_comment_approved filter

Approve, hold, spam, or trash

Hold for moderation

Mark as spam

Send straight to trash

Force approve (whitelist a known-good pattern)

Testing patterns before they go live

Plugin alternatives

What to do next

FAQ

Sources

Ishan Karunaratne

Related posts

WordPress Rewrite Rules Not Working? The Checklist

Crontab Builder: Make and Read Cron Expressions

WordPress: Sending HTML Formatted Emails Using the wp_mail() Function

Does WordPress actually treat each line in 'Comment Moderation' as a regex?

What's the difference between 'Comment Moderation' and 'Disallowed Comment Keys'?

When did 'Comment Blacklist' get renamed to 'Disallowed Comment Keys'?

Can I use lookaheads and lookbehinds in the moderation fields?

How do I whitelist a regular commenter so they bypass moderation?

Why is my regex matching legitimate comments?

Should I use regex moderation or Akismet?

Where should the pre_comment_approved code live?

Sources

Ishan Karunaratne