TechEarl

Set WordPress Featured Images From a Spreadsheet of URLs

A sheet of post_id and image_url, a WP-CLI command that sideloads each URL into the media library and sets it as the post's featured image. Change-only, dry-run by default, and it skips posts that already have a thumbnail so re-running never duplicates.

Ishan Karunaratne⏱️ 12 min readUpdated
Share thisCopied
Set WordPress featured images from a spreadsheet of URLs: a WP-CLI command that sideloads each image into the media library and sets it as the post thumbnail, idempotent.

Setting WordPress featured images from a spreadsheet of URLs comes down to three steps per row: sideload the image URL into the media library to get an attachment ID, set that attachment as the post's thumbnail, and record the source URL so a second run skips the post instead of downloading the file again. A WP-CLI command does all three. The sheet has a post_id column and an image_url column; the command reads it, fetches each URL onto the server, attaches it, and sets the featured image, writing nothing unless you pass --live.

This is the same pull model the rest of this cluster uses, with one extra moving part: instead of writing a string into post meta, WordPress has to go and fetch a binary file from the open internet, store it as a real attachment, and link it. That fetch is what makes featured images the trickiest of these jobs to make idempotent, and it is where almost everyone gets bitten.

The sheet

One row per post. The join key is post_id; the value to apply is the image URL.

Reading the sheet is the part already covered elsewhere: authenticate as a service account, call spreadsheets.values.get, drop the header row. The command below is self-contained; te_sheet_token() (which mints a JWT and exchanges it for an access token) and te_sheet_read() are the same service-account helpers used across the cluster, explained in depth in bulk-updating custom fields from a Google Sheet, and the CSV-vs-API trade-off (whether you even need a service account, or whether a published-to-web CSV is enough) is in reading a Google Sheet in PHP. I will not re-explain those here; this article is about what happens to each row once you have it.

The function that does the work: media_sideload_image

The whole download-attach-set dance is two function calls. media_sideload_image() fetches a URL, stores it as an attachment, and (with the right fourth argument) hands you back the attachment ID. set_post_thumbnail() then links that attachment as the post's featured image.

php
$attachment_id = media_sideload_image( $url, $post_id, $desc, 'id' );
if ( ! is_wp_error( $attachment_id ) ) {
	set_post_thumbnail( $post_id, $attachment_id );
}

The fourth argument is the one people miss. media_sideload_image() defaults to returning an HTML <img> tag, which is useless here. Pass 'id' and you get the integer attachment ID, which is exactly what set_post_thumbnail() wants. (The other accepted values are 'src' for the attachment URL and 'html', the default, for the <img> tag, but 'id' is the one for this job.) It returns a WP_Error on a failed download, so always guard with is_wp_error() before you set the thumbnail.

The gotcha that breaks it in WP-CLI: load the admin includes first

Here is the one thing that will stop this dead. media_sideload_image() lives in wp-admin/includes/media.php, and that file (plus the two it depends on) is only loaded when WordPress is rendering an admin page. In a WP-CLI command, a cron callback, or any front-end context, those files are not loaded, so the function is undefined and you get a fatal Call to undefined function media_sideload_image().

You have to require them yourself before the first call:

php
require_once ABSPATH . 'wp-admin/includes/media.php';
require_once ABSPATH . 'wp-admin/includes/file.php';
require_once ABSPATH . 'wp-admin/includes/image.php';

All three matter. media.php declares media_sideload_image(); file.php provides download_url() (which media_sideload_image calls to fetch the file to a temp path); image.php provides wp_read_image_metadata() and the thumbnail-generation helpers that run when the attachment is created. Drop any of them and you either get an undefined-function fatal or a half-built attachment with no generated sizes. Require all three once, at the top of the command, and you are clear. This is the single most common reason a sideload script that works when pasted into a plugin admin page falls over the moment you move it to cron.

The real footgun: idempotency, or you will download everything twice

media_sideload_image() has no memory. Call it twice for the same post and the same URL and it will happily download the file again, create a second attachment, and re-point the thumbnail at the duplicate. The first attachment is now an orphan, your uploads directory has two copies, and your wp_postmeta is fatter for nothing. Run a 4,000-row sheet twice by accident and you have 4,000 orphaned attachments and a doubled media library.

So the command must guard the sideload, and there are two complementary checks:

  1. Skip posts that already have a featured image. has_post_thumbnail( $post_id ) is true the moment a thumbnail is set, so a row whose post is already done costs one cheap function call and no download. This is the right default for a "fill in the missing ones" job.
  2. Track the source URL in post meta and skip when it is unchanged. Store the URL you sideloaded in a meta key (I use te_featured_src) and compare the sheet's URL against it. If they match, the current featured image already came from that exact URL, so there is nothing to do. This is what you want when the sheet is a living source of truth and a thumbnail might legitimately need replacing because the URL in the sheet changed, while still never re-downloading an unchanged one.

Using both, the logic per row is: if the stored source equals the sheet URL, skip (nothing changed). Otherwise sideload, set the thumbnail, and record the new source. That makes the command change-only and idempotent: re-running it right after a successful run downloads nothing and writes nothing, which is the same discipline as the safe batch updater, applied to a binary fetch instead of a meta string.

The command

This registers wp te-sheet featured. It reads the sheet, applies the change-only rule, and writes nothing (no downloads, no thumbnails) unless you pass --live. The dry run is the default precisely because the live run touches the filesystem and the network.

php
<?php
/**
 * Plugin Name: TE Sheet Featured Images
 * Plugin URI:  https://techearl.com/wordpress-set-featured-images-from-spreadsheet
 * Description: WP-CLI command that sideloads image URLs from a Google Sheet and sets each as a post's featured image. Change-only and idempotent.
 * Version:     1.0.0
 * Author:      Ishan Karunaratne
 * Author URI:  https://techearl.com
 */

defined( 'ABSPATH' ) || exit;
if ( ! defined( 'WP_CLI' ) || ! WP_CLI ) { return; }

// SSRF guard. media_sideload_image() fetches the URL server-side, and WordPress
// does NOT block private/internal addresses by default, so an arbitrary URL from
// a sheet anyone can edit is a server-side request forgery vector. Pin the hosts
// you actually serve images from.
const TE_IMG_HOSTS = array( 'cdn.example.com', 'assets.example.com' );

function te_host_allowed( string $src ): bool {
	$parts = wp_parse_url( $src );
	return 'https' === ( $parts['scheme'] ?? '' )
		&& in_array( strtolower( $parts['host'] ?? '' ), TE_IMG_HOSTS, true );
}

/** Base64url-encode (JWT segments are unpadded and use -_ instead of +/). */
function te_b64url( $s ) { return rtrim( strtr( base64_encode( $s ), '+/', '-_' ), '=' ); }

/**
 * Mint a short-lived service-account access token (JWT bearer grant).
 * Scope defaults to read-only; pass the read-write Sheets scope to write back.
 */
function te_sheet_token( array $key, string $scope = 'https://www.googleapis.com/auth/spreadsheets.readonly' ) {
	$now    = time();
	$header = te_b64url( wp_json_encode( array( 'alg' => 'RS256', 'typ' => 'JWT' ) ) );
	$claims = te_b64url( wp_json_encode( array(
		'iss'   => $key['client_email'],
		'scope' => $scope,
		'aud'   => 'https://oauth2.googleapis.com/token',
		'iat'   => $now,
		'exp'   => $now + 3600,
	) ) );
	$sig = '';
	openssl_sign( "{$header}.{$claims}", $sig, $key['private_key'], 'sha256WithRSAEncryption' );
	$jwt = "{$header}.{$claims}." . te_b64url( $sig );
	$res = wp_remote_post( 'https://oauth2.googleapis.com/token', array(
		'body' => array(
			'grant_type' => 'urn:ietf:params:oauth:grant-type:jwt-bearer',
			'assertion'  => $jwt,
		),
	) );
	return json_decode( wp_remote_retrieve_body( $res ), true )['access_token'] ?? '';
}

/** Read a sheet range over the Sheets API. Returns the raw rows, header included. */
function te_sheet_read( string $sheet_id, string $range, string $token ): array {
	$url = sprintf( 'https://sheets.googleapis.com/v4/spreadsheets/%s/values/%s', rawurlencode( $sheet_id ), rawurlencode( $range ) );
	$res = wp_remote_get( $url, array( 'headers' => array( 'Authorization' => "Bearer {$token}" ) ) );
	return json_decode( wp_remote_retrieve_body( $res ), true )['values'] ?? array();
}

class TE_Sheet_Featured_Command {

	/**
	 * Sideload image URLs from a sheet and set each as a post's featured image.
	 *
	 * ## OPTIONS
	 * --sheet=<id>   : Spreadsheet ID.
	 * --key=<path>   : Path to the service-account JSON key.
	 * [--range=<a1>] : Sheet range. Default: Sheet1!A:B
	 * [--force]      : Replace even posts that already have a featured image.
	 * [--live]       : Actually download and set. Without it, dry run.
	 */
	public function featured( $args, $assoc ) {
		// Required only in CLI/cron context: these admin files are not auto-loaded.
		require_once ABSPATH . 'wp-admin/includes/media.php';
		require_once ABSPATH . 'wp-admin/includes/file.php';
		require_once ABSPATH . 'wp-admin/includes/image.php';

		$key   = json_decode( file_get_contents( $assoc['key'] ), true );
		$range = $assoc['range'] ?? 'Sheet1!A:B';
		$force = isset( $assoc['force'] );
		$live  = isset( $assoc['live'] );

		$token = te_sheet_token( $key );
		if ( ! $token ) { WP_CLI::error( 'Could not get an access token.' ); }

		$rows = te_sheet_read( $assoc['sheet'], $range, $token );
		array_shift( $rows ); // drop the header row

		$changes = array();
		foreach ( $rows as $row ) {
			$post_id = (int) ( $row[0] ?? 0 );
			$src     = trim( (string) ( $row[1] ?? '' ) );
			if ( ! $post_id || '' === $src || ! get_post( $post_id ) ) { continue; }

			// SSRF guard: refuse any URL not on the trusted-host allowlist.
			if ( ! te_host_allowed( $src ) ) {
				WP_CLI::warning( sprintf( 'Post %d: refused non-allowlisted image host (%s)', $post_id, $src ) );
				continue;
			}

			// Idempotency guard: same source already applied means nothing to do.
			$applied = (string) get_post_meta( $post_id, 'te_featured_src', true );
			if ( $applied === $src ) { continue; }

			// Default to a fill-the-gaps job: leave existing thumbnails alone unless --force.
			$status = 'set from sheet';
			if ( ! $force && has_post_thumbnail( $post_id ) ) {
				continue;
			}
			if ( has_post_thumbnail( $post_id ) ) { $status = 'replaced (--force)'; }

			$changes[] = array( 'post_id' => $post_id, 'image_url' => $src, 'action' => $status );

			if ( $live ) {
				$attachment_id = media_sideload_image( $src, $post_id, null, 'id' );
				if ( is_wp_error( $attachment_id ) ) {
					WP_CLI::warning( sprintf( 'Post %d: %s', $post_id, $attachment_id->get_error_message() ) );
					array_pop( $changes );
					continue;
				}
				set_post_thumbnail( $post_id, $attachment_id );
				update_post_meta( $post_id, 'te_featured_src', $src );
			}
		}

		if ( empty( $changes ) ) { WP_CLI::success( 'Every post already has its featured image from the sheet.' ); return; }
		WP_CLI\Utils\format_items( 'table', $changes, array( 'post_id', 'image_url', 'action' ) );
		$mode = $live ? 'set' : 'WOULD set (dry run)';
		WP_CLI::success( sprintf( '%d featured images %s from the sheet.', count( $changes ), $mode ) );
	}
}
WP_CLI::add_command( 'te-sheet', 'TE_Sheet_Featured_Command' );

A few things worth pointing at. The null third argument to media_sideload_image() is the description (it defaults the attachment's title and alt from the filename, which is usually fine; pass a real string if your sheet has an alt-text column). The array_pop() on a download error keeps the dry-run count honest by un-recording a row that failed to fetch. And because the source-URL guard runs before the has_post_thumbnail() check, a changed URL in the sheet is the one case that gets past an existing thumbnail without --force, which is the behaviour you want when the sheet is authoritative.

Run it

Dry run first, every time. It reads the sheet, decides what it would do per row, and prints the plan without touching the network or the media library:

A terminal running wp te-sheet featured as a dry run: a table lists posts 5001 to 5004 with their image URLs and an action of set from sheet, while two already-thumbnailed posts are absent because they were skipped, ending with a success line saying four featured images would be set from the sheet
The dry run: four posts whose featured image would be sideloaded from the sheet's URL, with posts that already have a thumbnail skipped. The live run with --live performs the downloads and sets each thumbnail.

Posts already carrying a thumbnail do not appear in the table at all (they were skipped before any work), and re-running after a successful --live pass shows the "already has its featured image" success line with an empty plan, because every te_featured_src now matches the sheet. That is idempotency you can see.

Where the files actually go, and why they are normal attachments

A sideloaded image is not a special second-class thing. media_sideload_image() downloads the file into wp-content/uploads/, inserts an attachment post into wp_posts, records the metadata in wp_postmeta, and generates the registered thumbnail sizes, exactly as if you had dragged the file into the media library by hand. So everything you already do with attachments still applies:

  • The image shows up in Media in wp-admin, croppable and editable like any upload.
  • If you change your theme's image sizes later, wp media regenerate rebuilds the sideloaded files' thumbnails along with everything else.
  • The attachment is linked to its post as the thumbnail via the _thumbnail_id meta key that set_post_thumbnail() writes, which is the same key the editor's Featured Image panel sets.

One consequence worth knowing: the fetch is server-side. Your WordPress server has to be able to reach each image_url, so URLs behind a VPN, an IP allowlist, or localhost will fail with a download error (which the command logs and moves past). That same server-side fetch is a security boundary, which is why the command refuses any URL not on the explicit TE_IMG_HOSTS allowlist. media_sideload_image() does not block private or internal addresses on its own (WordPress only validates URLs when a request opts into reject_unsafe_urls, and this path does not), so without the allowlist a sheet that a content editor can change could point your server at http://169.254.169.254/ (cloud instance metadata) or an internal-only admin service. That is a classic SSRF, and pinning the source to the handful of hosts your images really come from closes it. Keep the allowlist tight. And because each download blocks while it pulls a full-size image over the network and then generates several resized copies, a few thousand rows is real wall-clock time and real memory. For jobs that big, batch them: process the sheet in chunks, or lean on the WP-CLI memory and batching practices in running bulk scripts with WP-CLI at scale so a long run does not exhaust PHP's memory limit partway through.

If the "posts" are WooCommerce products and you want to set the product image (and maybe a gallery), do not write _thumbnail_id or set_post_thumbnail() directly. Go through the CRUD layer so Woo's own caches and lookups stay consistent. The setters take attachment IDs, so you still sideload the same way; you just hand the IDs to the product object instead:

php
$product = wc_get_product( $post_id ); // or wc_get_product_id_by_sku( $sku )

$main_id = media_sideload_image( $main_url, $post_id, null, 'id' );
if ( ! is_wp_error( $main_id ) ) {
	$product->set_image_id( $main_id );
}

$gallery_ids = array();
foreach ( $gallery_urls as $g_url ) {
	$g_id = media_sideload_image( $g_url, $post_id, null, 'id' );
	if ( ! is_wp_error( $g_id ) ) { $gallery_ids[] = $g_id; }
}
if ( $gallery_ids ) { $product->set_gallery_image_ids( $gallery_ids ); }

$product->save();

A natural sheet for this has a sku join column, a main_image_url, and a pipe-separated gallery_image_urls cell. Resolve the SKU to a product ID with wc_get_product_id_by_sku(), then the same idempotency guard (te_featured_src for the main image) keeps re-runs free.

Push or pull, and where this fits

This is a pull: WordPress reaches out to Google on its own schedule, reads the sheet, and applies it, which is the right shape for a hands-off cron job that keeps featured images in step with a content team's spreadsheet. The mirror image is the push model, where the sheet POSTs into a WordPress endpoint when someone clicks a button, better when a human decides the moment. Whichever direction you run, the featured-image step is the same two calls and the same idempotency guard; only the trigger and the credentials differ. Put wp te-sheet featured on a nightly cron and a column of URLs in a sheet becomes a maintained source of truth for every post's cover image, with no one logging into wp-admin to upload anything.

Sources

Authoritative references this article was fact-checked against.

TagsWordPressGoogle SheetsWP-CLIFeatured ImageMedia LibraryPHP

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Software Systems Architect · Senior Software Engineer · Engineering Leadership

Software systems architect and senior software engineer with more than two decades designing, building, and running production software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Now a CTO, though what I write here is drawn from the full arc of that work, across architecture, engineering, and operations, not any single job.

Keep reading

Related posts