WordPress Table of Contents (No Plugin): Build One From Your Headings

A table of contents is two small jobs: give every heading in the post a stable id, then print a list of links pointing at those ids. You do not need a plugin for that. You need one the_content filter that parses the rendered HTML, adds the ids, builds the nav, and returns the content with the nav prepended. Here is the whole thing, ready to drop into a must-use plugin:

php

add_filter( 'the_content', 'te_table_of_contents', 20 );

function te_table_of_contents( $content ) {
    if ( ! is_singular() || ! in_the_loop() || ! is_main_query() ) {
        return $content;
    }

    $dom = new DOMDocument();
    libxml_use_internal_errors( true );
    $dom->loadHTML(
        '<?xml encoding="utf-8"?>' . $content,
        LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD
    );
    libxml_clear_errors();

    $xpath    = new DOMXPath( $dom );
    $headings = $xpath->query( '//h2 | //h3 | //h4' );

    if ( $headings->length < 2 ) {
        return $content;
    }

    $items = array();
    $seen  = array();

    foreach ( $headings as $heading ) {
        $text = trim( $heading->textContent );
        $slug = sanitize_title( $text );

        if ( '' === $slug ) {
            continue;
        }
        if ( isset( $seen[ $slug ] ) ) {
            $slug .= '-' . ++$seen[ $slug ];
        } else {
            $seen[ $slug ] = 1;
        }

        $heading->setAttribute( 'id', $slug );

        $items[] = sprintf(
            '<li class="te-toc-%s"><a href="#%s">%s</a></li>',
            esc_attr( $heading->nodeName ),
            esc_attr( $slug ),
            esc_html( $text )
        );
    }

    $body  = $dom->saveHTML();
    $nav   = '<nav class="te-toc" aria-label="Table of contents"><p class="te-toc-title">On this page</p><ul>'
           . implode( '', $items )
           . '</ul></nav>';

    return $nav . $body;
}

That is the technique. Everything below is the reasoning, the caching, and the gotchas, because the snippet above re-parses the HTML on every page view, which you do not want on a busy post.

Why DOMDocument and not a regex

The obvious shortcut is to match headings with a regex: something like preg_replace_callback over /<h([2-4])>(.*?)<\/h\1>/. It works on the post you tested it on, and then it breaks the first time a heading carries an attribute (<h2 class="wp-block-heading">), wraps inline markup (<h3>Using <code>jq</code></h3>), or contains a stray > in an attribute value. HTML is not a regular language, and hand-rolled patterns over real post content fail in ways that are tedious to chase.

DOMDocument parses the markup into a tree and lets DOMXPath ask for //h2 | //h3 | //h4 directly, regardless of attributes or nesting. The textContent of each node gives me the heading text with inner tags already flattened, which is exactly what I want for the slug and the link label. Two details make it behave with WordPress content:

The '<?xml encoding="utf-8"?>' prefix forces UTF-8 so accented headings do not turn into mojibake. loadHTML() otherwise assumes Latin-1.
LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD stops libxml wrapping the fragment in <html><body> and a doctype, so saveHTML() returns just the content I fed it.

libxml_use_internal_errors( true ) swallows the warnings libxml raises on the HTML5 it does not fully understand. The parse still succeeds; you are just silencing noise.

How the id injection and nav build work together

The loop does both jobs in one pass. For each heading it derives a slug from the text with sanitize_title() (the same function WordPress uses for post slugs, so the ids look native), sets that as the heading's id, and pushes a matching <li> with an <a href="#slug"> into the nav array. The headings are mutated in place on the DOM tree, so when I call saveHTML() at the end, the returned body already carries the ids. The nav is assembled from the same loop, which guarantees every link has a target and every target has a link.

The $seen map handles duplicate headings. Two sections both titled "Notes" would otherwise produce two id="notes" attributes, and an anchor can only land on the first. Appending -2, -3 keeps every id unique, which is both valid HTML and the difference between a working anchor and one that silently jumps to the wrong place.

I scope the filter to is_singular() && in_the_loop() && is_main_query() so it only fires on the actual post body, not on excerpts, widgets, or a second loop in the sidebar that also runs content through the_content. Priority 20 runs the filter after WordPress's own content formatting (autop, shortcodes) at the default 10, so I am parsing the final HTML the visitor will see.

Smooth scrolling with CSS, not JavaScript

There is no JavaScript in this. Clicking an anchor link is a native browser jump; you make it glide with one CSS rule instead of a scroll library:

css

html {
    scroll-behavior: smooth;
}

/* Stop a sticky header from covering the heading you jumped to. */
:target {
    scroll-margin-top: 6rem;
}

.te-toc {
    margin: 1.5rem 0;
    padding: 1rem 1.25rem;
    border: 1px solid #e2e2e2;
    border-radius: 8px;
    background: #fafafa;
}
.te-toc ul { margin: 0.5rem 0 0; padding-left: 1.25rem; }
.te-toc-h3 { margin-left: 1rem; }
.te-toc-h4 { margin-left: 2rem; }

scroll-behavior: smooth is the entire animation. The one that people forget is scroll-margin-top: if your theme has a sticky header, the anchor jump parks the target heading flush against the top of the viewport, where the fixed header sits on top of it. scroll-margin-top reserves that gap so the heading lands below the header instead of behind it. The te-toc-h3 / te-toc-h4 margins indent sub-headings so the list reads as an outline.

If you would rather respect a reader's reduced-motion setting, wrap the smooth scroll in a media query so it only animates for people who have not asked the OS to stop animations.

Only render with two or more headings, and cache the result

Two refinements turn the snippet from a demo into something you would ship.

First, the guard you already saw: if ( $headings->length < 2 ). A table of contents on a post with one heading (or none) is clutter. Rendering only when there are two or more headings means the nav appears exactly where it earns its place and stays out of the way on short posts.

Second, caching. The DOM parse is cheap per call but pointless to repeat: a published post's content does not change between views, so parsing it on every request is wasted work on a popular page. Store the generated output in a transient keyed by post ID, and bust that transient when the post is saved:

php

add_filter( 'the_content', 'te_table_of_contents', 20 );

function te_table_of_contents( $content ) {
    if ( ! is_singular() || ! in_the_loop() || ! is_main_query() ) {
        return $content;
    }

    $post_id = get_the_ID();
    $cached  = get_transient( 'te_toc_' . $post_id );
    if ( false !== $cached ) {
        return $cached;
    }

    $output = te_build_toc( $content );
    set_transient( 'te_toc_' . $post_id, $output, WEEK_IN_SECONDS );

    return $output;
}

// Bust the cache whenever the post is saved.
add_action( 'save_post', 'te_clear_toc_cache' );

function te_clear_toc_cache( $post_id ) {
    delete_transient( 'te_toc_' . $post_id );
}

te_build_toc() is the body of the first snippet (the parse, the loop, the nav assembly), refactored out so the cached and uncached paths share it. get_transient() returns false on a miss, which is why the check is false !== $cached: a legitimately empty string would otherwise look like a miss forever. The WEEK_IN_SECONDS expiry is a backstop; save_post is what actually keeps it fresh, firing on every publish and update so an edited post regenerates its TOC on the next view.

One caveat on save_post: it also fires for autosaves and revisions. If you want to be strict, bail out when wp_is_post_autosave() or wp_is_post_revision() is true before deleting, though deleting a transient that will simply be regenerated is cheap enough that I usually do not bother.

Put it in a must-use plugin

I never keep content filters in functions.php. Theme code vanishes the moment you switch themes, and a table of contents quietly disappearing after a theme change is a confusing bug to track down later. Drop the whole thing into wp-content/mu-plugins/te-table-of-contents.php with a proper header:

php

<?php
/**
 * Plugin Name: TE Table of Contents
 * Plugin URI:  https://techearl.com/wordpress-table-of-contents-no-plugin
 * Description: Builds a table of contents from a post's h2-h4 headings, injects heading ids, and caches the output per post.
 * Version:     1.0.0
 * Author:      Ishan Karunaratne
 * Author URI:  https://techearl.com
 * License:     GPL-2.0-or-later
 * Text Domain: te-toc
 */

if ( ! defined( 'ABSPATH' ) ) {
    exit;
}

// te_table_of_contents(), te_build_toc(), te_clear_toc_cache()
// and the add_filter / add_action calls from the snippets above go here.

Must-use plugins in mu-plugins/ load automatically, before regular plugins, and survive theme switches. The ABSPATH guard stops the file executing if someone hits it directly. The CSS goes in your theme, or enqueue it from the same file with wp_enqueue_style if you want the plugin fully self-contained.

A note on schema: a table of contents does not need any. There is no FAQPage, HowTo, or sitelinks-search markup that makes a TOC eligible for a richer result, and bolting schema onto it just adds noise. Google builds "jump to" links from your heading ids and on-page anchors on its own. Clean heading ids are the structured-data signal here; keep it simple.

If you want the nav to fold away on mobile, wrap it in a <details> element with a <summary> of "On this page". That is native, JavaScript-free collapse-and-expand, and it inherits the smooth scroll for free.

Verify it worked

Two checks, both by hand.

View-source on a post with several headings and search for id=. Every <h2>, <h3>, and <h4> in the body should now carry an id that matches the slug in its nav link. If a heading has no id, its textContent probably sanitized to an empty slug (a heading that is only an emoji or punctuation), which the continue skips on purpose.

Then click each link in the rendered nav. The page should glide to the matching heading, and on a theme with a sticky header the heading should land below the header, not tucked behind it. If it lands behind the header, your scroll-margin-top is too small for that header's height; bump it. If a link jumps to the wrong section, you have a duplicate-id collision the $seen map should have caught, so confirm you kept that block when you refactored.

HTML is not a regular language, so a regex over real post content breaks as soon as a heading carries an attribute like class="wp-block-heading", wraps inline markup such as <h3>Using <code>jq</code></h3>, or contains an unexpected character. It works on your test post and fails in production.

DOMDocument parses the markup into a tree and lets DOMXPath query //h2 | //h3 | //h4 reliably, and each node's textContent gives you clean heading text with inner tags flattened.

Use priority 20, which runs after WordPress's own content formatting at the default 10 (autop, shortcode expansion). You want to parse the final HTML the visitor sees, with shortcodes already turned into markup, not the raw editor content.

Scope it with is_singular(), in_the_loop(), and is_main_query() so it only touches the post body and not excerpts or a sidebar loop that also runs through the_content.

Keep a map of slugs you have already used. When sanitize_title() produces a slug you have seen before (two sections both titled "Notes"), append -2, -3 and so on so each id stays unique.

Duplicate ids are invalid HTML, and worse, an anchor link can only ever land on the first matching id, so the second link silently jumps to the wrong place. The uniqueness check is what makes every link land where it should.

No. scroll-behavior: smooth on the html element animates every in-page anchor jump natively, with no script. Pair it with scroll-margin-top on the target so a sticky header does not cover the heading you jumped to.

If you want to respect a reader's reduced-motion preference, gate the smooth scroll behind a prefers-reduced-motion media query.

A published post's content does not change between views, so re-parsing it with DOMDocument on every request is wasted work on a popular page. Store the generated output in a transient keyed by post ID, read from it on a cache hit, and regenerate only on a miss.

Hook delete_transient() to save_post so editing the post clears the cache and the next view rebuilds it. Check the cache with false !== $cached, since get_transient() returns false on a miss and a legitimately empty value would otherwise look like a permanent miss.

Build a WordPress Table of Contents from Your Headings (No Plugin)

Why DOMDocument and not a regex

How the id injection and nav build work together

Smooth scrolling with CSS, not JavaScript

Only render with two or more headings, and cache the result

Put it in a must-use plugin

Verify it worked

See also

Sources

Ishan Karunaratne

Related posts

Set WordPress Featured Images From a Spreadsheet of URLs

Bulk-Update WordPress Custom Fields From a Google Sheet

AI for WordPress Content Teams: The Playbook

Why use DOMDocument instead of a regex to find the headings?

What priority should the the_content filter use?

How do I avoid duplicate heading ids?

Do I need JavaScript for the smooth scrolling?

Why cache the table of contents in a transient?

Sources

Ishan Karunaratne