If your WordPress site has hundreds of thousands of URLs and the sitemap is being built on every request (core's /wp-sitemap.xml or an SEO plugin assembling it on the fly), it will be slow and it will sometimes time out. The fix is to stop generating it per request and write static sitemap files to disk with a WP-CLI command, chunked to the 50,000-URL limit, with a sitemap index on top, regenerated on a schedule. Here is the command that does it:
wp te sitemap generate --dir=/var/www/html/sitemaps --base-url=https://example.comThat walks every published post in fixed-size batches, writes sitemap-1.xml, sitemap-2.xml, and so on, then writes a sitemap_index.xml that points at all of them. The rest of this article is the command's implementation, why the per-request approach falls over at scale, and how to serve and schedule the static files.
Why dynamic sitemaps choke at scale
WordPress core added XML Sitemaps in 5.5 (August 2020). They are fine for a normal site: core paginates at 2,000 URLs per file by default (wp_sitemaps_get_max_urls()), and a request to /wp-sitemap-posts-post-1.xml runs a bounded query and returns. SEO plugins do roughly the same thing, often with their own caching layer.
The problem starts when the URL count climbs into the hundreds of thousands and the sitemap is regenerated at request time. Every hit to a sitemap URL becomes a database query, an object hydration pass, and an XML serialization, all inside the PHP request that has to finish before max_execution_time. A few things go wrong together:
- Search engine crawlers fetch sitemaps aggressively. Googlebot, Bingbot, and the rest will pull every sitemap in your index, sometimes in parallel, sometimes repeatedly. Each fetch is real DB and CPU work if it is computed live.
- Cold caches hurt the most. The first request after a cache purge or a deploy pays the full cost. On a big site that can blow past the PHP timeout and return a partial or 504 response, which a crawler then records as a broken sitemap.
- The work is redundant. The post set barely changes minute to minute, but a per-request sitemap recomputes the same XML for every fetch. You are paying to produce identical bytes over and over.
A static file has none of that cost at read time. Nginx or Apache hands back a flat .xml off disk in microseconds, no PHP, no MySQL. The expensive generation runs once, out of band, on your schedule rather than the crawler's.
The 50,000-URL / 50MB limit and why you chunk
The sitemap protocol is strict about size. From sitemaps.org: a single sitemap file may contain no more than 50,000 URLs and must be no larger than 50MB (52,428,800 bytes) uncompressed. The same ceilings apply to a sitemap index: no more than 50,000 sitemaps listed, and 50MB.
So a site with 600,000 URLs cannot live in one file. You split the URLs across multiple sitemap files, each holding at most 50,000 entries, and write a sitemap index that lists every chunk. The index is what you submit to search engines; they read it and fetch each child sitemap.
50,000 is the hard ceiling, not a recommendation. In practice I chunk smaller (often 25,000-45,000) so I stay clear of the 50MB byte limit too. A URL with a long path plus <lastmod> runs well under a kilobyte, so 50,000 of them is nowhere near 50MB, but if you add <image:image> blocks or long query strings the byte budget tightens fast. Pick a MAX_URLS_PER_FILE you are comfortable with and let the code do the math.
The WP-CLI command
The command registers under the te namespace and exposes a generate subcommand, so the full invocation is wp te sitemap generate. The class method approach is the standard WP-CLI pattern: register a class with WP_CLI::add_command() and its public methods become subcommands.
<?php
/**
* Plugin Name: TE Static Sitemap
* Plugin URI: https://techearl.com/wordpress-static-xml-sitemap-wp-cli
* Description: WP-CLI command that writes static, chunked XML sitemap files plus an index to disk for large WordPress sites.
* Version: 1.0.0
* Author: Ishan Karunaratne
* Author URI: https://techearl.com
* License: GPL-2.0-or-later
* Text Domain: te-static-sitemap
*/
if ( ! defined( 'ABSPATH' ) ) {
exit;
}
if ( ! defined( 'WP_CLI' ) || ! WP_CLI ) {
return;
}
const TE_SITEMAP_MAX_URLS = 45000;
const TE_SITEMAP_BATCH_SIZE = 5000;
class TE_Sitemap_Command {
/**
* Generate static sitemap files and an index.
*
* ## OPTIONS
*
* [--dir=<path>]
* : Directory to write the files into. Must be writable and web-served.
*
* [--base-url=<url>]
* : Site base URL. Defaults to home_url().
*
* @when after_wp_load
*/
public function generate( $args, $assoc_args ) {
$dir = rtrim( $assoc_args['dir'] ?? ABSPATH, '/' );
$base_url = rtrim( $assoc_args['base-url'] ?? home_url(), '/' );
if ( ! is_dir( $dir ) || ! is_writable( $dir ) ) {
WP_CLI::error( "Directory not writable: {$dir}" );
}
$files = te_generate_sitemap( $dir, $base_url );
WP_CLI::success(
sprintf( 'Wrote %d sitemap file(s) and a sitemap_index.xml.', count( $files ) )
);
}
}
WP_CLI::add_command( 'te sitemap', 'TE_Sitemap_Command' );The work itself lives in te_generate_sitemap(), kept as a plain function so it can also be called from a scheduled event (more on that below). It pages through post IDs, opens a new chunk file each time it crosses TE_SITEMAP_MAX_URLS, and writes the index at the end:
function te_generate_sitemap( $dir, $base_url ) {
$files = array();
$file_index = 1;
$in_file = 0;
$handle = null;
$paged = 1;
$open_file = static function () use ( &$handle, &$file_index, &$files, $dir ) {
$name = "sitemap-{$file_index}.xml";
$path = "{$dir}/{$name}";
$handle = fopen( $path, 'w' );
fwrite( $handle, '<?xml version="1.0" encoding="UTF-8"?>' . "\n" );
fwrite( $handle, '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n" );
$files[] = $name;
};
$close_file = static function () use ( &$handle ) {
if ( $handle ) {
fwrite( $handle, '</urlset>' . "\n" );
fclose( $handle );
$handle = null;
}
};
$open_file();
do {
$query = new WP_Query( array(
'post_type' => 'post',
'post_status' => 'publish',
'posts_per_page' => TE_SITEMAP_BATCH_SIZE,
'paged' => $paged,
'fields' => 'ids',
'no_found_rows' => true,
'update_post_meta_cache' => false,
'update_post_term_cache' => false,
'orderby' => 'ID',
'order' => 'ASC',
) );
if ( empty( $query->posts ) ) {
break;
}
foreach ( $query->posts as $post_id ) {
if ( $in_file >= TE_SITEMAP_MAX_URLS ) {
$close_file();
$file_index++;
$in_file = 0;
$open_file();
}
$loc = esc_url( get_permalink( $post_id ) );
$lastmod = get_post_modified_time( 'c', true, $post_id );
fwrite(
$handle,
" <url><loc>{$loc}</loc><lastmod>{$lastmod}</lastmod></url>\n"
);
$in_file++;
}
$paged++;
unset( $query );
wp_cache_flush();
} while ( true );
$close_file();
te_write_sitemap_index( $dir, $base_url, $files );
return $files;
}And the index writer, which lists every chunk file:
function te_write_sitemap_index( $dir, $base_url, $files ) {
$now = gmdate( 'c' );
$handle = fopen( "{$dir}/sitemap_index.xml", 'w' );
fwrite( $handle, '<?xml version="1.0" encoding="UTF-8"?>' . "\n" );
fwrite( $handle, '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n" );
foreach ( $files as $name ) {
$loc = esc_url( "{$base_url}/sitemaps/{$name}" );
fwrite( $handle, " <sitemap><loc>{$loc}</loc><lastmod>{$now}</lastmod></sitemap>\n" );
}
fwrite( $handle, '</sitemapindex>' . "\n" );
fclose( $handle );
}This is deliberately a posts-only example. A real large site usually has pages, products, and custom post types too: extend the WP_Query to loop over each post_type you need (and terms, if you index category and tag archives), keeping the same batch-and-chunk structure. Building your own WP-CLI command around this is the natural way to package it; see my notes on writing a custom WP-CLI command for the registration and argument-parsing details.
Batching and memory on a huge dataset
The thing that breaks naive sitemap scripts on big sites is memory, not speed. Load 600,000 full WP_Post objects into PHP and you are out of memory long before you finish. Every flag in that WP_Query is there to keep the footprint flat:
'fields' => 'ids'returns an array of integers, not hydrated post objects. You only need the ID to callget_permalink()andget_post_modified_time().'no_found_rows' => trueskips theSQL_CALC_FOUND_ROWSpass. You are paginating withpagedand stopping when a batch comes back empty, so you never need the total count, and that count is one of the most expensive parts of aWP_Queryon a large table.'update_post_meta_cache' => falseand'update_post_term_cache' => falsestop WordPress eagerly priming the meta and term caches for every post in the batch. You are not reading meta or terms here, so priming them is pure waste and pure memory.unset( $query )drops the batch's objects so PHP can reclaim them.wp_cache_flush()at the end of each batch is the important one. WordPress accumulates objects in its in-memory cache as you query, and over hundreds of batches that growth is what tips you into the memory limit. Flushing per batch keeps usage bounded no matter how many posts you have.
The result is constant memory regardless of dataset size: you hold one batch (here 5,000 IDs) at a time, stream each URL straight to the open file handle with fwrite(), and never build the whole XML string in memory. A 600,000-URL site processes in the same memory envelope as a 6,000-URL one, it just takes longer.
If even a single batch is heavy, drop TE_SITEMAP_BATCH_SIZE. Smaller batches mean more queries but lower peak memory. On a constrained box I have run this at 1,000 and it is fine; the file handle does the streaming, so batch size only affects how much you hold in RAM at once, not the output.
Serving the files and pointing crawlers at them
Once the files exist on disk, you serve them like any other static asset and tell search engines where the index is.
If you write them under the web root (for example /var/www/html/sitemaps/), they are already reachable at https://example.com/sitemaps/sitemap_index.xml with no extra config: the web server hands back the flat file, PHP never runs. Then:
- Point robots.txt at the index so any crawler discovers it:
Sitemap: https://example.com/sitemaps/sitemap_index.xml- Submit the index URL in Google Search Console and Bing Webmaster Tools. You submit the one index file; the crawler reads it and fetches each child sitemap itself.
- Turn off the dynamic sitemap so you do not have two competing sources. For core, disable it with the
wp_sitemaps_enabledfilter returningfalse; for an SEO plugin, switch its XML sitemap feature off. You want exactly one sitemap of record, and on a large site that is the static index.
If you would rather keep the canonical /sitemap_index.xml path at the root instead of under /sitemaps/, write the index file to the web root directly, or add a small rewrite mapping the path to the static file. Just make sure the rewrite resolves to the flat file and does not fall through to WordPress's PHP, otherwise you are back to a dynamic response.
Scheduling regeneration
Static files are a snapshot, so they go stale as you publish. Regenerate them on a cadence that matches how often your content changes. Two ways:
System cron (preferred for large sites). A real cron entry calling WP-CLI runs reliably regardless of site traffic, which WP-Cron does not, since WP-Cron only fires when someone hits the site. A nightly run:
0 3 * * * cd /var/www/html && wp te sitemap generate --dir=/var/www/html/sitemaps --base-url=https://example.com >> /var/log/te-sitemap.log 2>&1That is the most robust option on a busy production box: the generation runs once a night on the server's schedule, completely decoupled from request traffic.
WP-Cron, if you cannot add a system cron entry. Schedule a recurring event that calls the same te_generate_sitemap() function the CLI command uses:
add_action( 'init', function () {
if ( ! wp_next_scheduled( 'te_sitemap_rebuild' ) ) {
wp_schedule_event( time(), 'daily', 'te_sitemap_rebuild' );
}
} );
add_action( 'te_sitemap_rebuild', function () {
$dir = WP_CONTENT_DIR . '/uploads/sitemaps';
te_generate_sitemap( $dir, home_url() );
} );Be honest about WP-Cron's limits on a heavy job, though: it runs inside a web request and is subject to the same PHP timeout that made the per-request sitemap a problem in the first place. For a site big enough to need static sitemaps, a real system cron is the right tool. If you do go the WP-Cron route, my walkthrough on scheduling a recurring task with WP-Cron covers wp_schedule_event() and the gotchas in detail.
Verify it worked

Run the command and confirm the files landed:
wp te sitemap generate --dir=/var/www/html/sitemaps --base-url=https://example.comYou should see something like Success: Wrote 13 sitemap file(s) and a sitemap_index.xml. Then list the directory and fetch one chunk to eyeball the XML:
ls -la /var/www/html/sitemaps/
curl -s https://example.com/sitemaps/sitemap-1.xml | head -n 20
curl -s https://example.com/sitemaps/sitemap_index.xmlCheck three things: each chunk opens with the <urlset> declaration and contains <url><loc> entries, no single file exceeds 50,000 URLs (grep -c '<loc>' sitemap-1.xml), and the index lists every chunk you generated. If you are serving from the web root, both curl calls should return the file in milliseconds with no PHP involvement. Confirm in your access log that the request did not hit index.php.
See also
- Write a Custom WP-CLI Command: the full pattern behind
wp te sitemap generate, from registering a command class to parsing positional and associative arguments - Schedule a Recurring Task with WP-Cron: how
wp_schedule_event()works, the traffic-dependent firing gotcha, and when to swap WP-Cron for a real system cron - How to Optimize WooCommerce: the database is almost always the slow path on a large store, the same reason a per-request sitemap hurts at scale
- Clean Up wp_head in WordPress: trimming the other default front-end output WordPress prints on every page
Sources
Authoritative references this article was fact-checked against.





