TechEarl

Fix: "Error Establishing a Database Connection" Stuck in Cache

Your database recovered but visitors still see the error page. A cache stored it because the error returned HTTP 200. Make db-error.php return 503 with no-store and standard caches (WP Super Cache, Varnish, nginx, Cloudflare) stop storing it.

Ishan Karunaratne⏱️ 15 min readUpdated
Share thisCopied
How to stop a WordPress database connection error from being cached and served after the database recovers

If your database recovered but the site still shows "Error establishing a database connection," the error page is stuck in a cache, and the fix is one file: make wp-content/db-error.php return a 503 status with a Cache-Control: no-store header.

Here is what happened. Your database went down for a minute, you fixed it, and the site is back. Except it is not: visitors still see the error even though the database is healthy and you can load the admin fine. Clearing your browser cache does nothing, because the stale page is not in your browser. It is in a cache between you and WordPress, and it got there because the error response came back as an ordinary HTTP 200 OK with no cache-control headers, which is exactly what a cache is built to keep.

First, confirm it is a cached error

Before you change anything, find out whether the database is failing right now or you are being served a stale copy of an old failure. Request the page and read the status line:

bash
curl -sI https://example.com/ | grep -i 'http\|cache'
  • HTTP/1.1 500 or 503 means the database is failing live. This is a real outage, not a cache problem, so start with how to fix "Error establishing a database connection", which walks through the actual causes (wrong credentials, a stopped database, connection limits, corrupted tables).
  • HTTP/1.1 200 OK on a page that still shows the error means the database is fine and a cache is serving you a stored error. That is the case this article fixes.

If it is the cached case, a private window or a browser-cache clear will not help, because the copy lives upstream. Purge in this order and re-test after each step:

  1. Your page-cache plugin (WP Super Cache, W3TC, WP Rocket, LiteSpeed).
  2. The server cache (Varnish, nginx fastcgi_cache).
  3. The CDN (Cloudflare and similar).

The browser is never the culprit here. Once you have cleared the stuck copy, the rest of this article stops it from coming back.

Why does a database error get stuck in your cache?

A cache stores a response when two things are true: the status code is one it considers cacheable (a 200 OK is the obvious one), and nothing in the headers tells it not to. A normal homepage meets both conditions, which is the whole point of a page cache. The trap is that a transient database failure can also come back as a bare 200 OK with no anti-cache headers, and at that point the cache cannot tell the difference between your real homepage and a "we are broken" page. It stores the broken one and serves it to everyone until the entry expires. The fix, later, is to break both conditions at once: an error status the cache will not keep, plus a header that says do not store this.

Modern WordPress core actually gets this right on its own. When the database is unreachable and you have no custom error page, core renders its built-in "Database Error" screen with a 500 status and aggressive no-cache headers:

curl showing WordPress core returns HTTP 500 with no-cache headers for the database error, while a custom db-error.php returns HTTP 200

That 500 plus Cache-Control: no-store is deliberate. The WordPress reference for dead_db() spells it out: the DB error "sets the HTTP status header to 500 to try to prevent search engines from caching the message. Custom DB messages should do the same."

And there is the trap. The moment you drop a custom wp-content/db-error.php into place (the nice on-brand "we will be right back" page that every tutorial tells you to add), you take over the whole response. Core loads your file and gets out of the way:

php
// wp-includes/class-wpdb.php — what core does when the DB is unreachable
if ( file_exists( WP_CONTENT_DIR . '/db-error.php' ) ) {
    require_once WP_CONTENT_DIR . '/db-error.php';
    die();
}

It does not set a status code for you. PHP's default is 200 OK. So a custom error page that does not call http_response_code() ships a friendly-looking maintenance screen with a 200 on it, which is precisely the thing a cache loves to keep. That is the second line in the screenshot above, and it is the root cause of almost every "the database is fine but the error won't go away" report. The database engine underneath does not matter here: MySQL or MariaDB, the response and the failure mode are identical.

Reproducing it, so you actually believe it

It is worth seeing this happen rather than taking my word for it. I put a stock WordPress behind Varnish (any edge or full-page cache that stores plain 200 HTML behaves the same way), installed the friendly db-error.php above, then stopped the database and made one request:

Varnish returns X-Cache MISS for the database error while the database is down, then X-Cache HIT serving the same cached error after the database is back up
X-Cache HIT after recovery: the cached 200 error outlives the outage and is served while the database is healthy.

The first request while the database is down is a MISS: the cache fetches the 200 error from the origin and stores it. I bring the database back. The next request is a HIT, and Varnish cheerfully serves the cached "we will be right back" page even though the origin is now perfectly healthy. The site is "down" with a working database, and nothing on the origin will fix it until that cache entry expires or you purge it.

One important clarification, because people reflexively blame whichever caching plugin they happen to run: this is not really a plugin bug, and it is not specific to any one of them. A page-cache plugin (WP Super Cache, W3 Total Cache, WP Rocket, LiteSpeed Cache) will not cache a connect-time database failure, because the failure calls dead_db() before WordPress fires init, so the plugin never arms its caching for that request. Those plugins also tend to protect you during an outage by serving their existing cached copy of a page, which needs no database at all. The pages that get poisoned are the ones whose cache was cold at the moment the database blipped, and the layer that stores the bad copy is almost always an HTTP-level cache that has no idea what WordPress is doing internally: Varnish, an nginx fastcgi_cache, or a CDN with full-page caching turned on. They see a cacheable 200 with no anti-cache headers, and they keep it.

The fix: make db-error.php return 503

Break both of the conditions that let a cache keep the page: give the response an error status the cache will not store, and add a no-store header that says so explicitly. A 503 Service Unavailable is the right status for a transient outage, and a Retry-After header tells crawlers and uptime monitors it is temporary, not a real removal. Replace your wp-content/db-error.php with this:

php
<?php
// wp-content/db-error.php
// Served automatically by WordPress when the database is unreachable.
// The status and the no-store header together tell every cache layer not to keep this.
http_response_code( 503 );
header( 'Retry-After: 30' );
header( 'Cache-Control: no-store, no-cache, must-revalidate, max-age=0' );
?><!doctype html>
<html lang="en">
<head><meta charset="utf-8"><title>We will be right back</title></head>
<body style="font-family:sans-serif;text-align:center;padding-top:15%">
  <h1>We will be right back</h1>
  <p>The site is briefly unavailable. Please try again in a moment.</p>
</body>
</html>

Now the same outage produces a 503 with no-store that no sane cache will keep:

With db-error.php returning 503 and no-store, Varnish returns X-Cache MISS on every request during the outage and never caches the error
Every request during the outage is a MISS. The 503 plus no-store header is never cached, so recovery is instant.

Every request during the outage is a MISS. Varnish will not keep this response on two counts: a 503 is not a cacheable status, and the Cache-Control: no-store header rules it out a second time. The moment the database recovers, the next request is the real page, with no stale error lingering. WordPress core uses 500 with the same no-cache headers. That belt-and-suspenders pairing, an error status plus a no-store header, is what reliably keeps caches out, whether a given layer keys on the status code, on the headers, or on both. I reach for 503 plus Retry-After because it is the most honest description of a temporary outage.

In short: a db-error.php that returns 503 with Cache-Control: no-store stops every HTTP-level cache from storing the WordPress database error.

There is an SEO reason to get the status right, not just an availability one. A cached 200 error page can be crawled and indexed as your homepage, or flagged as a soft 404, which quietly tanks the URL. A 503 with Retry-After tells search engines the outage is temporary and to come back, so the ranking survives the blip.

If you do not need a branded page at all, the zero-config option is to delete wp-content/db-error.php entirely. With no drop-in, modern WordPress core handles the outage itself with a 500 and no-store headers, which is already cache-safe. The custom file is only worth keeping if you want an on-brand message, and then it is your job to set the status.

Then purge the copy you already poisoned. The fix stops new bad entries, but the one that is currently stuck will sit there serving 200s until you clear it. So the fix is two steps:

  1. Replace wp-content/db-error.php with the version above (503 + no-store).
  2. Purge the cache entry that is already stuck, or the old 200 keeps serving until it expires.

Flush your page-cache plugin, run the CDN's purge, or delete the on-disk cache directory. Run the delete from your WordPress root so a mis-paste cannot match the wrong path:

bash
# WP Super Cache / on-disk page cache — run from your WordPress root
cd /var/www/html && rm -rf wp-content/cache/supercache/*

# Cloudflare — purge just the affected URL (least privilege beats purge_everything)
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/purge_cache" \
  -H "Authorization: Bearer $CF_TOKEN" \
  -H "Content-Type: application/json" \
  --data '{"files":["https://example.com/"]}'

Prevent it at every cache layer

The 503 drop-in is the single fix that protects every layer at once, because all of them honor either the error status or the no-store header (and it sends both). But it helps to understand how each one behaves, especially if you run a custom configuration that overrides the defaults.

Page-cache plugins (WP Super Cache, W3 Total Cache, WP Rocket, LiteSpeed Cache). All of these cache successful page responses, and the mechanism is identical across them: none will keep a 503, and as covered above none even reach their caching stage on a connect-time failure. The 503 drop-in covers all four at once. The per-plugin part is mostly about where the stuck copy lives and how to purge it:

  • WP Super Cache: nothing to configure for this. Confirm you are not shipping a custom db-error.php that returns 200, then flush a stuck copy with Settings > WP Super Cache > Delete Cache (the files live under wp-content/cache/supercache/).
  • W3 Total Cache: check Page Cache for any option that caches non-200 responses and leave it off, then Performance > Purge All Caches. In disk-enhanced mode the static copies sit under wp-content/cache/page_enhanced/, which is where a poisoned page hides.
  • WP Rocket: Rocket caches pages for logged-out visitors through its own page cache, which (like the others) does not run on a connect-time failure. Clear it from the admin bar or Settings > WP Rocket > Clear Cache.
  • LiteSpeed Cache: the cache lives at the LiteSpeed or OpenLiteSpeed server level, so purge through the plugin (Toolbox > Purge All) rather than deleting files. The drop-in's 503 keeps LSCache out the same way.

The one cross-plugin setting to check is any "cache 404s and other non-200 responses" option; leave it off for error statuses.

nginx fastcgi_cache / proxy_cache. This is a classic source of the bug, because a lot of tutorials configure caching to store everything. Cache only successful and redirect responses, and give errors a zero or tiny TTL:

nginx
# Cache 200/301/302 for a while; do not hold onto errors.
fastcgi_cache_valid 200 301 302 10m;
fastcgi_cache_valid 500 502 503 504 0;

If you use proxy_cache in front of PHP-FPM or another origin, the equivalent proxy_cache_valid rules apply. Check fastcgi_cache_use_stale too: if it lists http_500 or http_503, nginx will deliberately serve a stale error during a backend hiccup, which is the same stuck-error symptom by another route. Keep error codes out of that directive.

Varnish. The built-in VCL already declines to cache responses that carry Cache-Control: no-store, no-cache, or private, and it does not treat a 503 as cacheable in the first place, so a stock Varnish is safe on both counts. The way people break this is by writing a custom vcl_backend_response that force-sets a TTL on everything, which overrides that status default. The no-store header from the drop-in still protects you there, but make the intent explicit and gate on the status too:

vcl
sub vcl_backend_response {
    if ( beresp.status >= 500 ) {
        set beresp.uncacheable = true;
        return (deliver);
    }
    set beresp.ttl = 1h;
}

Cloudflare and other CDNs. Out of the box Cloudflare does not cache HTML at all, so most sites are never exposed here. You get bitten only when you have turned on a "Cache Everything" page rule or Automatic Platform Optimization (APO), which cache HTML at the edge. Even then, Cloudflare respects origin Cache-Control and does not cache 5xx responses by default, so the 503 drop-in keeps you safe. Two settings undo that safety, though: an "Edge Cache TTL" or status-code Cache Rule that assigns a TTL to error responses, and "Origin Cache Control" turned off, which makes Cloudflare ignore your no-store. If you run either, make sure error statuses stay out of the cached set. Worth knowing: Cloudflare's "Always Online" is the opposite, useful feature, it serves the last known-good copy when your origin is unreachable, which is the right way to ride out an outage, not the cause of this problem.

A dead object cache is a different failure (Redis, Memcached)

Everything above is about the database connection failing at boot, which trips dead_db() before WordPress finishes loading, which is exactly why a page-cache plugin like WP Super Cache does not store it. A persistent object cache fails differently, and the distinction matters. If you run Redis or Memcached through a wp-content/object-cache.php drop-in and that backend goes away, the drop-in usually throws later in the request, after init, and WordPress ends the request with wp_die(). By that point your page-cache plugin is fully armed, so unlike the database case, a page cache can store the result.

The symptom is identical (a stuck error page), but the trigger is the object cache, not the database, and the message is usually "Error establishing a Redis connection." The fix follows the same principle: make the failure return a non-200 status with no-store so nothing keeps it. Most Redis drop-ins can be configured to fail with a 500 instead of degrading into a cacheable-looking page; if yours cannot, fronting it with the same status-aware cache rules above is your backstop.

Do not just hide the outage

The 503 page stops a transient failure from getting immortalized, which is the user-facing emergency. But a database that drops mid-traffic is telling you something, and the drop-in does nothing about the cause. Once the bleeding has stopped, go find out why the connection died:

  • max_connections. If you are hitting the ceiling under load, new connections are refused and WordPress falls straight into the error path. Compare your real concurrency against the limit.
  • The OOM killer. A MySQL or MariaDB process that got killed for memory will show up in dmesg -T | grep -i oom. That is a sizing or a runaway-query problem, not a caching one.
  • Deploys that restart the database. If your deployment restarts MySQL, the restart window is exactly when an uncached request can capture the error. The 503 page makes that window harmless instead of sticky.

If your real symptom is the opposite, a site that shows "briefly unavailable for scheduled maintenance" and never comes back, that is a different drop-in (the .maintenance file) with its own fix, which I covered in the WordPress maintenance-mode stuck page.

Sources

Authoritative references this article was fact-checked against.

TagsWordPressCachingWP Super CacheCloudflareVarnishDatabase

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Software Systems Architect · Senior Software Engineer · Engineering Leadership

Software systems architect and senior software engineer with more than two decades designing, building, and running production software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Now a CTO, though what I write here is drawn from the full arc of that work, across architecture, engineering, and operations, not any single job.

Keep reading

Related posts

Run sudo iftop -i eth0 to see bandwidth per connection in real time. The interface flag, -n and -P for readable output, the 2s/10s/40s rate columns, BPF filter syntax, and how iftop differs from nload.

iftop: See Bandwidth by Connection in Real Time

sudo iftop -i eth0 shows a live, per-connection bandwidth table: which host pairs are moving traffic and at what rate. The interface flag people forget, the -n and -P switches that make the output readable, the 2s/10s/40s columns, the filter syntax, and when nload or iftop is the right tool.

Connect to an AWS EC2 instance using plain SSH with a key pair, EC2 Instance Connect, AWS Systems Manager Session Manager, or an EC2 Instance Connect Endpoint for private instances. Default usernames, security group rules, and troubleshooting Permission denied and Connection timed out.

How to SSH into an AWS EC2 Instance

Connect to an EC2 instance four ways: plain SSH with a key pair, EC2 Instance Connect, Session Manager, and EC2 Instance Connect Endpoint. Default usernames, security group rules, and the troubleshooting matrix that fixes Permission denied and Connection timed out.