TechEarl

replytocom URLs in Google Search Console: Why They Are Harmless

Thousands of ?replytocom= URLs in Search Console come from WordPress comment reply links. Why they are already nofollow and canonicalized, and why blocking them in robots.txt is the one fix that backfires.

Ishan Karunaratne⏱️ 9 min readUpdated
Share thisCopied
Thousands of ?replytocom= URLs in Search Console come from WordPress comment reply links. Why they are harmless and why robots.txt is the wrong fix.

Those ?replytocom=12345 URLs filling up your Google Search Console report are WordPress talking to itself. Every threaded-comment Reply link carries one as a no-JavaScript fallback, and Googlebot can still crawl it. They are already rel="nofollow", already canonicalized back to the post they belong to, and they land in Search Console's "not indexed" reasons (usually "Alternate page with proper canonical tag") exactly as they should. The fix that half the internet recommends, blocking ?replytocom in robots.txt, is the one move that can actually get them stuck in the index.

That is the short answer: on a normal site these URLs are harmless and the right action is usually nothing. The rest of this article shows where they come from in WordPress core, how to confirm in sixty seconds that yours are handled, and why each of the four "fixes" you will find on forum threads ranges from redundant to counterproductive.

Turn on threaded (nested) comments in Settings → Discussion and WordPress renders a Reply link under every comment.

WordPress Settings Discussion screen, Other comment settings, with the 'Enable threaded (nested) comments' checkbox ticked and the number of levels set to 5
Settings → Discussion → Other comment settings. 'Enable threaded (nested) comments' is the toggle that turns every comment's Reply link into a ?replytocom URL.

That Reply link's href is the post permalink with a ?replytocom=<comment_id> parameter bolted on. This is the code in core, get_comment_reply_link() in wp-includes/comment-template.php:

php
esc_url(
    add_query_arg(
        array(
            'replytocom'      => $comment->comment_ID,
            'unapproved'      => false,
            'moderation-hash' => false,
        ),
        $permalink
    )
) . '#' . $args['respond_id']

So a post with 50 comments and threading enabled renders up to 50 distinct https://example.com/the-post/?replytocom=N#respond URLs, one per Reply button. Every one is a crawlable variant of the same post.

The parameter is a fallback, not the real mechanism. The same <a> also carries data-* attributes, and when comment-reply.js is loaded (standard themes enqueue it whenever threaded comments are on) the JavaScript reads those attributes, moves the comment form into place under the comment you clicked, and cancels the navigation. This is the relevant part of comment-reply.js:

javascript
follow = window.addComment.moveForm( commId, parentId, respondId, postId, replyTo );
if ( false === follow ) {
    event.preventDefault();
}

A human with JavaScript never lands on the ?replytocom URL. The form just slides into position. But the href is still sitting in the static HTML, and Googlebot does not run that click handler. It sees a link, it queues the link. That is the whole story: the parameter exists so the Reply button still works without JavaScript, and the cost is a pile of crawlable URLs that only a crawler ever visits.

See them in your own page source

You do not have to take my word for it. On any WordPress post with a few threaded comments, fetch the page and grep for the parameter. Here is a clean install in Docker (see running WordPress in Docker for the setup):

Terminal showing curl of a WordPress post piped to grep replytocom, revealing several reply-link hrefs each with a different replytocom comment ID and rel=nofollow
Each Reply link in the page source is the same post URL plus a unique ?replytocom value, and every one already carries rel=nofollow.

Note the rel="nofollow" on each link. That matters for one of the "fixes" below.

They are already nofollow, and already canonical

Two things in WordPress core handle these URLs before you touch anything.

First, the links are nofollow out of the box. Core renders the reply link with rel="nofollow" by default (you can see it in the page source above). The popular advice to "add nofollow to your reply links to fix replytocom" is describing something WordPress already does for you. It was also never a real de-duplication mechanism, because nofollow is a crawl hint, not a canonical signal.

Second, the post canonicalizes the variants back to itself. WordPress emits a self-referencing <link rel="canonical"> on singular views, via rel_canonical():

php
function rel_canonical() {
    if ( ! is_singular() ) {
        return;
    }
    $id = get_queried_object_id();
    $url = wp_get_canonical_url( $id );
    if ( ! empty( $url ) ) {
        echo '<link rel="canonical" href="' . esc_url( $url ) . '" />' . "\n";
    }
}

A ?replytocom URL is still a singular post view, so it gets the same canonical tag pointing at the clean permalink. Google reads it, recognizes the parameter URL as a duplicate of the post, and consolidates the two. That is why these URLs do not compete with your real post for rankings, and why they pile up in the "not indexed" reasons rather than getting indexed.

In the Page Indexing report a correctly canonicalized variant normally shows up under "Alternate page with proper canonical tag" (Google saw the canonical and folded the variant into the post). Before Google settles that, the same URL can sit under "Crawled - currently not indexed" (it fetched the URL and has not indexed it). Both are "not indexed" reasons, not errors, and there is no penalty for having URLs sit in them. For the full breakdown of what those statuses mean and why repeatedly hitting "Validate fix" does nothing for them, see Discovered vs Crawled - currently not indexed.

The check to run first

Before you "fix" anything, confirm yours are actually being canonicalized. Pick one ?replytocom URL from your report and look at what it returns:

bash
curl -s "https://example.com/some-post/?replytocom=42" | grep -i '<link rel="canonical"'

If you get back a canonical tag pointing at the clean post URL (https://example.com/some-post/), Google has everything it needs and these URLs are handled. That is the normal result on any post with an SEO plugin (Yoast, Rank Math, SEOPress) or on plain core for singular posts.

The main problem case is a site that emits no canonical at all on the page, which on modern WordPress basically means you have disabled core's rel_canonical and run no SEO plugin (robots-blocking the URL or a plugin emitting a conflicting canonical can cause trouble too, but those are rarer). Then the parameter URLs are real, un-consolidated duplicates. The fix there is to restore a canonical, not to chase the parameter. The same SEO plugins whose title and meta output you can override all add the canonical automatically.

What not to do (and why)

Every forum thread about replytocom collects the same four suggestions. The reasoning matters more than the verdicts:

Proposed actionVerdictWhy
Block ?replytocom in robots.txt (Disallow: /*?replytocom)Don'tA blocked URL cannot be crawled, so Google never sees the canonical that would consolidate it. Robots-blocked URLs can still be indexed URL-only if linked, and you convert a self-cleaning report into a stuck one.
Strip the parameter with a functions.php snippet or an SEO-plugin toggleUsually pointlessCore already nofollows and canonicalizes the links. Stripping them removes the no-JavaScript reply fallback for the small number of users and crawlers without JS, to solve a non-problem.
Add rel="nofollow" to the reply linksRedundantCore already adds it. You would be re-doing WordPress's default.
Configure replytocom in Google's URL Parameters toolObsoleteGoogle retired the URL Parameters tool in 2022. Any guide pointing you there predates its removal.

The robots.txt one is worth spelling out because it is the most repeated and the most backwards. Google's own documentation is explicit: for a page-level signal like canonical or noindex to be honored, the URL must not be blocked by robots.txt, because a blocked crawler never reads the page. Block ?replytocom and you take URLs that Google was quietly folding into your posts and freeze them in an ambiguous state where it can no longer check what they are. If any were indexed before you added the rule, they can stay indexed, now labeled "Indexed, though blocked by robots.txt". You have made the report worse, not better. If you do edit robots.txt for other reasons, test the pattern against a real URL before you ship it. The same robots.txt trap, and the same right answer, applies to Wordfence's wordfence_lh beacon URLs if those are cluttering your report too.

If you genuinely have thousands indexed

Comment-heavy sites that predate the canonical era sometimes did get large numbers of replytocom URLs into the index, and a few owners blamed ranking drops on it. If that is your situation, the lever is the same as for any duplicate-content cleanup: make sure the canonical is present (it is, on modern core plus any SEO plugin), let Google recrawl and consolidate, and be patient. Recrawling and folding thousands of URLs takes weeks to months, and there is no button that forces it. Disabling threaded comments removes the links entirely, but you lose nested replies, which is usually a worse trade than letting the canonical do its job.

The parameter has been a known wart for over a decade. There is an open WordPress core ticket, #22889 "Reconsider no-JS ?replytocom= links", debating whether to drop the no-JS fallback. It has stayed open for years without a fix: dropping the fallback would break the no-JavaScript reply flow, and these links are harmless enough that it has never been worth that trade. That is the right read for your site too: harmless, handled, leave them be.

Sources

Authoritative references this article was fact-checked against.

TagsWordPressreplytocomGoogle Search ConsoleWordPress SEOCommentsCrawl BudgetCanonicalrobots.txt

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Software Systems Architect · Senior Software Engineer · Engineering Leadership

Software systems architect and senior software engineer with more than two decades designing, building, and running production software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Now a CTO, though what I write here is drawn from the full arc of that work, across architecture, engineering, and operations, not any single job.

Keep reading

Related posts