TechEarl

Update a WordPress Business Directory From a Google Sheet

Keep a directory of thousands of listings in sync with a Google Sheet a non-dev maintains: a WP-CLI pull command that reconciles each listing change-only, parses structured hours, and flips permanently-closed places to a closed status instead of deleting them.

Ishan Karunaratne⏱️ 11 min readUpdated
Share thisCopied
Sync a WordPress business directory of thousands of listings to a Google Sheet: a change-only WP-CLI pull command, structured hours, and a closed status that keeps the page for SEO.

Updating a WordPress business directory from a Google Sheet comes down to four moves: read the sheet with a service account, match each row to a listing post by a stable ID column, apply only the fields that actually changed, and flip a permanently_closed row to a closed status rather than deleting the page. The whole thing is a WP-CLI command that runs change-only and dry-run-by-default, so it is safe to put on a nightly cron against thousands of listings without anyone watching it.

This is the directory-shaped version of the service-account pull: the same JWT-signed read, but applied to a custom post type with a handful of real-world fields (phone, address, hours, website, closed-state) where a non-developer owns the data in a sheet and WordPress reconciles to it. The two things that make a directory different from a flat custom-field update are scale (thousands of rows, so memory discipline matters) and the closed-listing rule (a business shutting down is not a row to delete, it is a status to set).

The data model

A directory listing is a custom post type. I register listing with the fields a place actually has: phone, address, website, an hours structure, and the closed-state flag. Whether those live in ACF or native meta is a real decision (covered in ACF fields vs native post meta); for the pull it barely matters, because I am reading and writing them by meta key either way.

php
function te_register_listing_cpt() {
	register_post_type( 'listing', array(
		'label'        => 'Listings',
		'public'       => true,
		'show_in_rest' => true,
		'has_archive'  => true,
		'rewrite'      => array( 'slug' => 'listings' ),
		'supports'     => array( 'title', 'editor', 'thumbnail', 'custom-fields' ),
	) );

	// The open/closed state lives in a taxonomy so the theme can badge and
	// query it. The pull command sets the `open` / `closed` terms below, so the
	// taxonomy has to exist for those writes to land.
	register_taxonomy( 'listing_status', 'listing', array(
		'label'        => 'Listing status',
		'public'       => true,
		'hierarchical' => false,
		'show_in_rest' => true,
	) );
}
add_action( 'init', 'te_register_listing_cpt' );

The one rule that saves you later: the stable join key is a listing_id (or the post slug), never the business name. Names get re-typed, get a "LLC" appended, get a typo fixed in the sheet, and any of those would orphan the row from its post if you matched on the name. Give every listing an immutable ID column in the sheet and store it as meta on the post so the match is exact.

The sheet

One row per listing. The first column is the join key; the rest are the values to apply. The non-developer who maintains the directory only ever touches this sheet.

listing_idnamephoneaddresshourspermanently_closed
L-1001Bluebird Diner(319) 555-0142330 E Market St, Iowa City, IAMon-Fri 7:00-15:00; Sat-Sun 8:00-14:00
L-1002Prairie Lights Books(319) 555-018815 S Dubuque St, Iowa City, IAMon-Sun 9:00-21:00
L-1003Hamburg Inn No. 2(319) 555-0120214 N Linn St, Iowa City, IAMon-Sun 6:00-22:00yes

permanently_closed is left blank for an open business and set to yes when it shuts down. Everything else is a plain value the command writes onto the matching listing.

Authenticating and reading the sheet

WordPress is the API client here, so it needs an identity. A service account is the right one for server-to-server access: create it in the Google Cloud console, enable the Sheets API, download its JSON key, and share the sheet with the account's email (read-only is enough). The key lives outside the web root and is referenced by path, never committed.

The token is a signed JWT exchanged for an access token, which PHP mints with openssl_sign and the WordPress HTTP API. The plugin is self-contained: te_sheet_token() and te_sheet_read() (the same service-account helpers used across the sheet cluster) are included at the top of the command file below, so te_directory_rows() just calls them. If a published-to-the-web CSV would do instead of the API, that trade-off is weighed in reading a Google Sheet in PHP: CSV vs the API. The read itself is one spreadsheets.values.get call:

php
/** Fetch every row of the directory sheet as a 2D array, header included. */
function te_directory_rows( array $key, $sheet_id, $range ) {
	$token = te_sheet_token( $key ); // JWT -> short-lived access token
	if ( ! $token ) { return array(); }

	return te_sheet_read( $sheet_id, $range, $token );
}

Parsing structured hours

Hours are the one field that is not a flat string. "Mon-Fri 7:00-15:00; Sat-Sun 8:00-14:00" is readable for the person editing the sheet, but a template needs it structured: per-day open and close times so it can render a weekly table, compute "open now," and emit LocalBusiness schema. So the command parses that human string into a normalized array and stores it as a single JSON meta value (one read, one write, instead of fourteen per-day keys).

php
/** Turn "Mon-Fri 7:00-15:00; Sat-Sun 8:00-14:00" into a per-day open/close map. */
function te_parse_hours( $raw ) {
	$days  = array( 'mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun' );
	$hours = array();
	foreach ( $days as $d ) { $hours[ $d ] = null; } // null = closed that day

	foreach ( explode( ';', $raw ) as $clause ) {
		$clause = trim( $clause );
		if ( '' === $clause ) { continue; }
		// "Mon-Fri 7:00-15:00"  ->  range of days + open-close
		if ( ! preg_match( '/^(\w{3})(?:-(\w{3}))?\s+(\d{1,2}:\d{2})-(\d{1,2}:\d{2})$/', $clause, $m ) ) { continue; }
		$from = array_search( strtolower( $m[1] ), $days, true );
		$to   = '' !== $m[2] ? array_search( strtolower( $m[2] ), $days, true ) : $from;
		if ( false === $from || false === $to ) { continue; }
		for ( $i = $from; $i <= $to; $i++ ) {
			$hours[ $days[ $i ] ] = array( 'open' => $m[3], 'close' => $m[4] );
		}
	}
	return $hours;
}

Storing hours as JSON keeps the diff honest: the command compares the freshly-parsed array against the stored one and only writes when the schedule genuinely changed, which is most of the point of running change-only. If your template prefers fourteen individual fields, parse the same way and write per-day keys instead; the parsing is identical, only the storage shape differs.

Closed, not deleted

When a business shuts down, the sheet sets permanently_closed to yes. The wrong move is wp_delete_post: the listing has earned its URL, its inbound links, and its rankings, and a hard 404 throws all of that away. The right move is a status change that keeps the page live but marks it closed. I do two things on a closed row:

  1. Set a permanently_closed meta flag to 1 so the template can render a "Permanently closed" banner and drop the place from "open now" filters.
  2. Move it to a closed term (a taxonomy the archive and template can badge and exclude from the main directory listing), or set post_status to draft if it should leave the public archive entirely.

A flag plus a term is usually better than draft, because a closed business is still a page people search for ("did Hamburg Inn close?") and you want that query to land on your listing, clearly labelled, not on a competitor. Keep the page, badge it, and let it keep ranking. Re-geocoding is a separate concern: if an address changed, the lat/long behind a map embed is now stale, so flag address changes for a geocoding pass (I do not implement that here, but the command can log which rows moved).

The command

This registers wp te-directory pull. It reads the sheet, resolves each listing_id to a post, applies the fields change-only (reusing the dry-run and backup discipline from the safe batch updater), and writes nothing unless you pass --live. At directory scale the part that matters is the loop: resolve IDs once, process in batches, and flush the object cache each batch so memory stays flat across thousands of rows. That batching pattern is the same harness from running bulk scripts with WP-CLI.

php
<?php
/**
 * Plugin Name: TE Directory Sync
 * Plugin URI:  https://techearl.com/wordpress-business-directory-update-google-sheet
 * Description: WP-CLI command that reconciles a WordPress listing directory to a Google Sheet, change-only.
 * Version:     1.0.0
 * Author:      Ishan Karunaratne
 * Author URI:  https://techearl.com
 * License:     GPL-2.0-or-later
 * Text Domain: te-directory-sync
 */

defined( 'ABSPATH' ) || exit;
if ( ! defined( 'WP_CLI' ) || ! WP_CLI ) { return; }

/** Base64url-encode (JWT segments are unpadded and use -_ instead of +/). */
function te_b64url( $s ) { return rtrim( strtr( base64_encode( $s ), '+/', '-_' ), '=' ); }

/**
 * Mint a short-lived service-account access token (JWT bearer grant).
 * Scope defaults to read-only; pass the read-write Sheets scope to write back.
 */
function te_sheet_token( array $key, string $scope = 'https://www.googleapis.com/auth/spreadsheets.readonly' ) {
	$now    = time();
	$header = te_b64url( wp_json_encode( array( 'alg' => 'RS256', 'typ' => 'JWT' ) ) );
	$claims = te_b64url( wp_json_encode( array(
		'iss'   => $key['client_email'],
		'scope' => $scope,
		'aud'   => 'https://oauth2.googleapis.com/token',
		'iat'   => $now,
		'exp'   => $now + 3600,
	) ) );
	$sig = '';
	openssl_sign( "{$header}.{$claims}", $sig, $key['private_key'], 'sha256WithRSAEncryption' );
	$jwt = "{$header}.{$claims}." . te_b64url( $sig );
	$res = wp_remote_post( 'https://oauth2.googleapis.com/token', array(
		'body' => array(
			'grant_type' => 'urn:ietf:params:oauth:grant-type:jwt-bearer',
			'assertion'  => $jwt,
		),
	) );
	return json_decode( wp_remote_retrieve_body( $res ), true )['access_token'] ?? '';
}

/** Read a sheet range over the Sheets API. Returns the raw rows, header included. */
function te_sheet_read( string $sheet_id, string $range, string $token ): array {
	$url = sprintf( 'https://sheets.googleapis.com/v4/spreadsheets/%s/values/%s', rawurlencode( $sheet_id ), rawurlencode( $range ) );
	$res = wp_remote_get( $url, array( 'headers' => array( 'Authorization' => "Bearer {$token}" ) ) );
	return json_decode( wp_remote_retrieve_body( $res ), true )['values'] ?? array();
}

class TE_Directory_Sync_Command {

	const BATCH = 200;

	/**
	 * Reconcile the listing directory to the directory sheet.
	 *
	 * ## OPTIONS
	 * --sheet=<id>  : Spreadsheet ID.
	 * --key=<path>  : Path to the service-account JSON key.
	 * [--range=<a1>]: Sheet range. Default: Listings!A:F
	 * [--live]      : Actually write. Without it, dry run.
	 */
	public function pull( $args, $assoc ) {
		$key   = json_decode( file_get_contents( $assoc['key'] ), true );
		$range = $assoc['range'] ?? 'Listings!A:F';
		$live  = isset( $assoc['live'] );

		$rows = te_directory_rows( $key, $assoc['sheet'], $range );
		if ( empty( $rows ) ) { WP_CLI::error( 'Read no rows from the sheet.' ); }
		array_shift( $rows ); // header

		// Resolve all listing_id -> post_id once, so the loop is a hash lookup.
		$index   = $this->build_listing_index();
		$changes = array();
		$batch   = 0;

		foreach ( $rows as $row ) {
			$listing_id = trim( $row[0] ?? '' );
			if ( '' === $listing_id || ! isset( $index[ $listing_id ] ) ) { continue; }
			$post_id = $index[ $listing_id ];

			$delta = $this->diff_listing( $post_id, $row );
			if ( empty( $delta ) ) { continue; } // change-only: nothing differs

			$changes[] = array(
				'listing_id' => $listing_id,
				'post_id'    => $post_id,
				'changed'    => implode( ', ', array_keys( $delta ) ),
			);

			if ( $live ) { $this->apply( $post_id, $delta ); }

			if ( ++$batch % self::BATCH === 0 ) {
				wp_cache_flush(); // keep memory flat across thousands of rows
			}
		}

		if ( empty( $changes ) ) { WP_CLI::success( 'Directory already matches the sheet.' ); return; }
		WP_CLI\Utils\format_items( 'table', $changes, array( 'listing_id', 'post_id', 'changed' ) );
		$mode = $live ? 'updated' : 'WOULD update (dry run)';
		WP_CLI::success( sprintf( '%d listings %s from the sheet.', count( $changes ), $mode ) );
	}

	/** One query: every listing_id meta -> its post ID. */
	private function build_listing_index() {
		global $wpdb;
		$rows  = $wpdb->get_results(
			"SELECT pm.meta_value AS listing_id, pm.post_id
			   FROM {$wpdb->postmeta} pm
			   JOIN {$wpdb->posts} p ON p.ID = pm.post_id
			  WHERE pm.meta_key = 'listing_id' AND p.post_type = 'listing'"
		);
		$index = array();
		foreach ( $rows as $r ) { $index[ $r->listing_id ] = (int) $r->post_id; }
		return $index;
	}

	/** Compare the sheet row to the stored listing; return only what differs. */
	private function diff_listing( $post_id, $row ) {
		// A non-developer owns the sheet, so treat every cell as untrusted: sanitize
		// on write here, and escape on output in the theme (esc_html/esc_attr).
		$want = array(
			'phone'              => sanitize_text_field( $row[2] ?? '' ),
			'address'            => sanitize_text_field( $row[3] ?? '' ),
			'hours'              => wp_json_encode( te_parse_hours( $row[4] ?? '' ) ),
			'permanently_closed' => ( strtolower( trim( $row[5] ?? '' ) ) === 'yes' ) ? '1' : '',
		);

		$delta = array();
		foreach ( $want as $field => $value ) {
			$current = (string) get_post_meta( $post_id, $field, true );
			if ( $current !== $value ) { $delta[ $field ] = $value; }
		}
		return $delta;
	}

	/** Write the changed fields, and handle the open/closed status transition. */
	private function apply( $post_id, array $delta ) {
		foreach ( $delta as $field => $value ) {
			update_post_meta( $post_id, $field, $value );
		}

		// Status transition only when the closed-state actually flipped.
		if ( array_key_exists( 'permanently_closed', $delta ) ) {
			if ( '1' === $delta['permanently_closed'] ) {
				wp_set_object_terms( $post_id, 'closed', 'listing_status' );
			} else {
				wp_set_object_terms( $post_id, 'open', 'listing_status' );
			}
		}
	}
}
WP_CLI::add_command( 'te-directory', 'TE_Directory_Sync_Command' );

Two scale notes worth calling out. The index is built with one SQL query rather than a WP_Query per row, so resolving ten thousand listing IDs is a single round-trip. And update_post_meta is the write path here; if these are ACF fields, swap to update_field so ACF's own caching and field references stay correct (again, the ACF vs native meta reasoning). If the directory spans several sites (a city-per-site network), the same command runs per site from one sheet using the multi-site fan-out pattern.

Run it

Dry run first, always. It reads the sheet, resolves every listing, diffs each field, and prints exactly which listings would change and on which columns, writing nothing:

bash
wp te-directory pull --sheet=1AbC...XyZ --key=/srv/secrets/directory-sa.json
A terminal dry-run table of directory listings whose phone, hours, or closed-status would change from the sheet, one row per listing with its listing_id, post ID, and the changed columns, across many listings, ending in a success line that N listings would update
The directory pull as a dry run: one row per changed listing, showing its listing_id, post ID, and exactly which fields differ from the sheet. Re-running after the live pass shows nothing, because it is change-only.

The changed column tells you what would move on each listing: a new phone number, an updated hours string, a flip to permanently_closed. A listing already matching the sheet never appears, so re-running right after a live pass reports "Directory already matches the sheet." Once the dry run looks right, add --live, then put the command on a nightly cron and the sheet becomes the directory's source of truth.

Idempotent and change-only by design

The command is safe to run on a schedule because every write is gated on a diff. Reading the current value, comparing to the sheet, and writing only the difference means a second run right after the first does nothing at all: no churn in wp_postmeta, no spurious modified_date bumps, no cache invalidation for listings that did not move. That is the whole reason it can sit on a nightly cron unattended against thousands of rows. The same discipline (read, compare, write only the delta, dry-run by default, back up first) is the backbone of every reconcile job in this cluster, written up in full in the safe batch updater.

The one piece I deliberately left out is geocoding. When a sheet row changes an address, the listing's stored coordinates are now stale, and a map embed will point at the old spot. Re-geocoding on every address change is a real follow-up, but it is a paid API call with its own rate limits, so I keep it out of the reconcile loop and instead have the command flag which listings moved. A separate, slower job can re-geocode just those.

Sources

Authoritative references this article was fact-checked against.

TagsWordPressGoogle SheetsWP-CLICustom Post TypeDirectoryACFPHP

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Software Systems Architect · Senior Software Engineer · Engineering Leadership

Software systems architect and senior software engineer with more than two decades designing, building, and running production software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Now a CTO, though what I write here is drawn from the full arc of that work, across architecture, engineering, and operations, not any single job.

Keep reading

Related posts