wordfence_lh URLs in Google Search Console: The Right Fix

Those ?wordfence_lh=1&hid=... URLs piling up in Google Search Console come from one specific Wordfence setting: Live Traffic in "All Traffic" mode (lh is short for log human). Each one is a tracking beacon Wordfence embeds in your pages to tell humans apart from bots. They are already served with a noindex header, so they are not hurting your rankings, and the fix is not robots.txt, not redirects, and not a noindex plugin. The fix is switching one option: Wordfence → All Options → Live Traffic Options → Traffic logging mode → SECURITY ONLY.

That is the short answer. The rest of this article shows where the URLs actually come from (with the plugin code and a live test install), the one check you should run before changing anything, and why the robots.txt advice you will find on most other pages handles this exactly backwards.

A typical Search Console report looks like this: thousands of URLs, all on your homepage, each with a unique 32-character hex hid, sitting under "Crawled - currently not indexed":

Google Search Console URL list showing many example-site.com URLs with wordfence_lh=1 and unique hid parameters, each crawled on a different date — Recreated Page Indexing view with a placeholder domain: every URL is the homepage plus a unique hid value.

Google Search Console 'Crawled - currently not indexed' status card, the report where wordfence_lh URLs accumulate — The report they accumulate under. For these URLs, this status is Google working as intended, not a problem to validate.

Where the URLs come from: Live Traffic's human check

Wordfence's Live Traffic feature logs visits at the server level, including bots that never execute JavaScript. When you set its logging mode to "All Traffic", Wordfence needs a way to tell a human browser from a bot that just fetches HTML. It solves that with a beacon: every time a logged-out visitor loads a page, the plugin records the hit in its wp_wfhits database table, then injects a small script into the page <head> carrying a URL that encodes that exact hit.

This is the relevant code in Wordfence 8.2.2, lib/wordfenceClass.php (the function is hooked to wp_head whenever Live Traffic is in All Traffic mode):

php

public static function wfLogHumanHeader(){
    // ...
    self::$hitID = self::getLog()->logHit();
    if (self::$hitID) {
        $URL = home_url('/?wordfence_lh=1&hid=' . wfUtils::encrypt(self::$hitID));
        $URL = addslashes(preg_replace('/^https?:/i', '', $URL));
        #Load as external script async so we don't slow page down.
        echo <<<HTML
<script type="text/javascript">
(function(url){
    // attaches listeners for mousemove, scroll, keydown, click...
    // on the FIRST human interaction, loads url + '&r=' + Math.random()
    // as an async <script>, which marks the hit as human
})('$URL');
</script>
HTML;
    }
}

Three things in those few lines explain the entire Search Console mess:

hid is your hit ID, encrypted. Every pageview inserts a row into wp_wfhits and embeds that row's encrypted ID in the HTML. Two visitors loading the same page get two different URLs.
The URL sits in your HTML source. Google does not need to click anything. Googlebot fetches your page, parses URL-shaped strings out of the inline script, and queues them for crawling. Every Googlebot fetch of any page mints a fresh one.
The script only fires on human interaction. The listeners wait for a mouse move, scroll, or keypress before loading the URL. A human triggers it within milliseconds; a bot never does. That is the whole point: the request to ?wordfence_lh=1 is the "a human is here" signal.

You can watch the URLs being minted on any affected site. Here are three requests to the homepage of a clean WordPress install (Docker, Wordfence 8.2.2, Live Traffic set to All Traffic). Same page, three different URLs:

Terminal showing three curl requests to the same WordPress homepage, each returning a different wordfence_lh hid value in the HTML source — Three loads of the same homepage embed three brand-new wordfence_lh URLs. A site doing 1,000 pageviews a day mints 1,000 of these daily.

And the round trip the beacon exists for: the hid decrypts to a row ID in wp_wfhits, and requesting the URL flips that row's jsRun column from 0 to 1, which is how the visit gets reclassified from bot to human:

Terminal showing the hid parameter decrypting to wfhits row 31, and the jsRun column flipping to 1 after the wordfence_lh URL is requested — The hid decrypts to row 31 in wp_wfhits. Fetching the beacon URL sets jsRun=1 on exactly that row: visit confirmed human.

So the URLs are not malware, not an attack, and not an infinite-loop bug. They are a deliberate measurement mechanism that happens to leave fingerprints all over your HTML, and Google diligently collects them. If you run server-side log forensics, you will recognize the design trade-off: Wordfence wants bot-vs-human data that JavaScript analytics can't see, and this beacon is the price.

First, check what your URLs actually return

Before touching any setting, spend sixty seconds confirming which of two situations you are in. Pick any wordfence_lh URL from your Search Console report and request its headers:

bash

curl -sI "https://example-site.com/?wordfence_lh=1&hid=2E1A7B0C9D4F36A8512B90C7E4D31F60"

The healthy case (Wordfence active, working as designed) looks like this:

Terminal showing curl -sI output for a wordfence_lh URL: HTTP 200, Content-Length 0, and X-Robots-Tag noindex — The healthy response: HTTP 200, empty body, and X-Robots-Tag: noindex. Google may crawl these, but it is explicitly told never to index them.

Wordfence intercepts any front-end request with ?wordfence_lh before WordPress renders a page and answers with an empty 200, Content-Length: 0, and crucially X-Robots-Tag: noindex. Here is the response side in the plugin source (ajax_lh_callback() in lib/wordfenceClass.php):

php

header('Content-type: text/javascript');
header("Connection: close");
header("Content-Length: 0");
header("X-Robots-Tag: noindex");

This interception runs whenever Wordfence is active, even after you disable Live Traffic. So in the healthy case, every one of those thousands of URLs is a blank, explicitly non-indexable response. Google files them under one of two excluded statuses: "Crawled - currently not indexed" (it fetched the URL and chose not to index the empty response), or, when it has read the header, "Excluded by 'noindex' tag" (shown in the URL Inspection tool as "URL marked 'noindex'"). Both mean the same thing in practice: Google crawled the URL and is deliberately keeping it out of the index, which is exactly what the empty noindex response asks for.

The broken case is when that same curl returns your homepage HTML, or a redirect to your search page. That usually means a 404-redirect plugin or a theme's "redirect unknown URLs to home" feature grabbed the request first, though a caching layer or CDN that strips the query string, or Wordfence being inactive or bypassed, can produce the same result. Now the beacon URLs are real, indexable duplicates of your homepage, and some of them will get indexed. If your check shows full page content instead of an empty response, the parameter URLs are a symptom and the redirect behavior is the actual SEO problem; the same class of issue I cover in catching 404s with custom routing. Fix the blanket redirect, then continue below.

One caveat on the check itself: curl -sI sends a HEAD request, while Googlebot fetches with GET, and a cache or CDN can answer the two differently. If the HEAD result looks off, repeat it as a GET (curl -s -D - -o /dev/null "...") and cross-check the URL in Search Console's URL Inspection tool, which shows exactly what Google last received.

The fix: set Live Traffic to Security Only

New URLs stop being created the moment Live Traffic stops logging all traffic. In wp-admin go to Wordfence → All Options, scroll to Live Traffic Options under Tool Options (or open Tools → Live Traffic and expand the options panel), and switch Traffic logging mode from ALL TRAFFIC to SECURITY ONLY. Save.

Wordfence Live Traffic Options panel with the Traffic logging mode control showing SECURITY ONLY and ALL TRAFFIC buttons, ALL TRAFFIC currently selected — The one setting that matters: Traffic logging mode. ALL TRAFFIC mints a beacon URL per pageview; SECURITY ONLY mints none.

In Security Only mode, Wordfence still logs logins, blocks, and attacks. What you lose is the all-visitor firehose view:

Wordfence Live Traffic feed on a test install showing page visits with type, IP address, response code, and human or bot classification dots — What All Traffic mode buys you: every hit, bot or human, in real time. On my test install the green dots are confirmed humans, courtesy of the wordfence_lh beacon.

Honestly assess whether you ever look at that screen. Most sites have this data in GA4 (humans) and server access logs (bots) already, with better retention. Wordfence themselves recommend Security Only, and it has been the default for new installations since Wordfence 7.1.18 in December 2018. If your site is showing this symptom today, either the install predates that default or someone flipped the mode on purpose, often "temporarily", during an incident.

Two related notes:

Hard-disable in code. If you manage many sites or want the setting change-proof, define the constant in wp-config.php. Wordfence checks it before the UI option, so the UI toggle stops mattering. It earns a place in a hardened wp-config template:

php

define('WORDFENCE_DISABLE_LIVE_TRAFFIC', true);

Some hosts already do this. WP Engine, for example, is force-disabled in the plugin code itself (WF_IS_WP_ENGINE), which is why you will never see this issue there regardless of the setting.

Switching the mode stops new URLs. The thousands already in Search Console fade out on their own, which brings us to what not to do while you wait.

What not to do (and why)

Every forum thread on this topic collects the same four suggestions. Three of them range from useless to actively counterproductive. The reasoning matters more than the verdicts, so here is both:

Proposed action	Verdict	Why
Switch Live Traffic to Security Only	Do this	Stops URL creation at the source. Nothing new for Google to find.
Block `?wordfence_lh` in robots.txt	Don't (with one exception below)	The URLs already serve `X-Robots-Tag: noindex`. A robots.txt block hides that header from Google.
301-redirect the parameter URLs to the homepage	Don't	While Wordfence is active these are functional endpoints, and redirecting thousands of one-off URLs just gives Google a new crawl-and-recrawl job. The URLs are never requested again anyway; each `hid` is used once.
Add noindex via an SEO plugin or "validate fix" in GSC	Pointless	The noindex is already there at the HTTP layer (run the curl check above). And "Validate Fix" is for clearing a reported error; an excluded status like this is not an error, so validation just churns without changing anything.

The robots.txt one deserves the full explanation, because it is the most commonly repeated advice and it contains a genuine trap. Google's own documentation is unambiguous: for noindex to work, the page must not be blocked by robots.txt, because a blocked crawler never sees the noindex rule. Block the parameter in robots.txt and you replace "Google crawls these occasionally, sees noindex, drops them" with "Google can never again check what these URLs are." Any URL Google already knows about can then linger, and if some were indexed during a broken-case episode (see the curl check above), they can stay indexed (Google may keep them on the strength of external links), now labeled "Indexed, though blocked by robots.txt". You will have converted a self-cleaning report into a stuck one.

The one legitimate exception: genuinely crawl-budget-constrained sites. Google's guidance on faceted and parameter URLs does recommend robots.txt disallows (Disallow: /*?*wordfence_lh=) to stop crawl waste on URLs you want neither crawled nor indexed. If you run a site at the scale where crawl budget is real (more on that threshold below) and your logs show Googlebot hammering these beacons, the disallow is defensible after you have switched off All Traffic mode and confirmed nothing got indexed. For everyone else it is a trap with no upside. If you do end up editing robots.txt, test the pattern before deploying; dnschkr.com's robots.txt checker parses your live file and shows what a given URL resolves to, which catches the classic wildcard-pattern typos.

Is this hurting your SEO or crawl budget?

Almost certainly not, and this is the part the panic-flavored articles skip. Two separate concerns get mixed together:

Indexing: the URLs carry X-Robots-Tag: noindex and an empty body. Google will not rank them, will not show them in search results, and will not count them as duplicate content. "Crawled - currently not indexed" in the Page Indexing report is an excluded status, and Google's documentation explicitly notes that not every URL on a site should be expected to be indexed. Google's John Mueller has said the same in plainer words: it is normal that not all pages on a site get indexed. The nuance worth keeping is that a real page of yours stuck in this status can be a quality signal worth investigating, but a contentless, noindex-headed beacon URL sitting there is the system working as designed, not a problem to chase. (For what that status and its "Discovered" sibling actually mean, and why re-requesting indexing does nothing, see Discovered vs Crawled - currently not indexed.) There is no penalty attached to having excluded URLs in that report.

Crawl budget: Google's crawl budget guidance puts numbers on who needs to care: sites with over a million unique pages, or over ten thousand pages that change daily. If that is not you, Google states plainly that keeping your sitemap current is adequate and crawl budget is not your problem. Googlebot crawling a few thousand empty beacon responses, each a sub-kilobyte 200, does not displace the crawling of your real content on a normal-sized site. On a genuinely huge site, it is one more reason to switch the mode off, and possibly the one case for the robots.txt disallow above.

What these URLs do cost you is signal-to-noise in Search Console. A report with 4,000 beacon URLs in it is a report where you will miss the five real pages that dropped out of the index. That alone justifies the fix, no SEO catastrophe required. If your report also has ?replytocom URLs from comment threads, they are the same kind of benign WordPress noise, handled the same way.

How long until they disappear from Search Console?

Set expectations correctly: weeks to a few months, thinning out progressively. After you switch to Security Only, no new beacon URLs enter your HTML, so Google discovers no new ones. The already-known URLs get recrawled at a declining rate (each recrawl finds the same empty noindex response, which demotes its priority), then age out of the report. There is no one-click button that meaningfully speeds up the permanent drop:

URL Removals tool: temporary (about six months of hiding), designed for urgent takedowns, and pointless for URLs that are not indexed in the first place.
Validate Fix: does not help here. "Crawled - currently not indexed" is an excluded status, not a fixable error, so the validation just churns.
The old URL Parameters tool: retired by Google in 2022. Older advice that mentions configuring wordfence_lh there is no longer actionable.

Patience is the mechanism. The queue drains on its own once the source is off.

If you removed Wordfence and the URLs are still there

Two flavors of this. First, simply seeing old URLs in the report after uninstalling is the same aging-out process described above; reports lag reality by months.

Second, the older parameter: before the short wordfence_lh form, the same mechanism used the longer ?wordfence_logHuman=1&hid=... parameter; Wordfence switched to the shorter wordfence_lh name in 6.3.20 (October 2017). Sites that ran Wordfence years ago can still find the old form in reports and crawl data. Same beacon, same answer.

There is one real behavioral change to know about: with Wordfence deactivated, nothing intercepts the parameter anymore, so https://example-site.com/?wordfence_lh=1&hid=ABC serves your actual homepage with a query string attached, a 200 with full content. Whether that becomes a duplicate-content nuisance depends on your canonical tag. WordPress core only outputs rel="canonical" on singular posts and pages, so a blog-index homepage gets none from core (a static front page is a page, so it does get one); in practice every SEO plugin (Yoast, Rank Math, SEOPress, and the rest, the same plugins whose title and meta output you can override) adds the homepage canonical, which points the parameter variants back at your homepage and resolves them as duplicates of it. If you run no SEO plugin at all, check your homepage source for a canonical tag before uninstalling Wordfence while these URLs are still circulating.

For completeness: if you want to reproduce any of this safely, the entire test setup in this article is a disposable WordPress in Docker with Wordfence installed and Live Traffic flipped to All Traffic. The beacon appears on the first logged-out page load.

It is a tracking beacon created by the Wordfence security plugin's Live Traffic feature in All Traffic mode. Each pageview logs a hit in Wordfence's database and embeds a unique URL (wordfence_lh=1 plus an encrypted hit ID called hid) in the page HTML. When a visitor interacts with the page, their browser requests that URL, which tells Wordfence the visit was human rather than a bot.

No. They are generated by Wordfence itself, not by malware. The URLs return empty responses with a noindex header. If your curl check shows them returning actual page content, the cause is still not a hack; it is a 404-redirect plugin or theme feature redirecting unknown URLs to your homepage.

Not as the primary fix. The URLs already serve X-Robots-Tag: noindex, and a robots.txt block prevents Google from seeing that header, which can leave already-indexed URLs stuck. Switch Wordfence's Traffic logging mode to Security Only instead. A robots.txt disallow is only worth considering on very large sites with a measured crawl-budget problem, after the setting change.

No. They are served with a noindex header and an empty body, so Google excludes them from the index. 'Crawled - currently not indexed' is an excluded status, not an error or a penalty. The practical cost is clutter in Search Console reports, and on million-page sites, some wasted crawl. Stopping new ones is still worth doing for report hygiene.

In wp-admin, go to Wordfence, then All Options, then Live Traffic Options, and set Traffic logging mode to SECURITY ONLY. To enforce it in code, add define('WORDFENCE_DISABLE_LIVE_TRAFFIC', true); to wp-config.php. New beacon URLs stop immediately; the ones already in Search Console age out over the following weeks to months.

wordfence_lh URLs in Google Search Console: The Right Fix

Where the URLs come from: Live Traffic's human check

First, check what your URLs actually return

The fix: set Live Traffic to Security Only

What not to do (and why)

Is this hurting your SEO or crawl budget?

How long until they disappear from Search Console?

If you removed Wordfence and the URLs are still there

Sources

Ishan Karunaratne

Related posts

replytocom URLs in Google Search Console: Why They Are Harmless

Node.js in GitHub Actions: setup-node and the Version Matrix

Discovered vs Crawled - Currently Not Indexed in Search Console

What is the wordfence_lh URL parameter?

Are wordfence_lh URLs a sign my site was hacked?

Should I block wordfence_lh in robots.txt?

Do wordfence_lh URLs hurt my Google rankings?

How do I stop Wordfence from creating these URLs?

Sources

Ishan Karunaratne