TechEarl

Using AI to Help Manage WordPress Infrastructure

AI plus SSH plus the standard server toolkit (systemd, nginx, fail2ban, certbot) accelerates infrastructure work without replacing the sysadmin judgment. Log triage, config audits, deploy verification, security review. Plus what to never delegate.

Ishan Karunaratne⏱️ 6 min readUpdated
Share thisCopied
AI + SSH + systemd + nginx + fail2ban + certbot workflows for WordPress sysadmin. Log triage, config audits, deploy verification, what to never delegate.

AI plus SSH plus the standard server toolkit (systemd, nginx, fail2ban, certbot, journalctl) accelerates WordPress infrastructure work in the same way it accelerates application development. Log triage that took twenty minutes takes three. Config audits that I used to skip because they were too time-consuming now run weekly. Deploy verification is automated where it used to be manual. The throughput lift is real. So is the list of things I will not delegate. Here is the canonical infrastructure workflow.

Jump to:

The setup: SSH + the agent + your existing tooling

Nothing exotic. The agent (Claude Code, in my case) runs in your local terminal. It can shell out to ssh user@host for any host in your ~/.ssh/config. From there it runs the same commands you would run: journalctl, systemctl, nginx -t, tail, grep, wp-cli, df, htop, etc.

The key constraint: limit what the SSH user can do. The agent inherits the privileges of whatever account it SSHes as. I run an unprivileged user with no sudo for AI-driven operations on production. For tasks that need elevated privileges, I run them manually after the agent surfaces the intent.

bash
# ~/.ssh/config
Host prod-web
    HostName 198.51.100.10
    User read-only      # an account with read access to logs and configs, no sudo
    IdentityFile ~/.ssh/prod-readonly-ed25519

The agent SSHes as read-only. It can read everything; it cannot break anything.

Workflow 1: structured log triage

The single most-used workflow. Once a week (or after any deploy), I run:

text
SSH to prod-web. Look at:
- the last 1000 lines of /var/log/nginx/access.log
- the last 500 lines of /var/log/nginx/error.log
- the last 500 lines of /var/log/php8.2-fpm.log
- journalctl -u nginx --since "24 hours ago" --no-pager
- journalctl -u php8.2-fpm --since "24 hours ago" --no-pager

Group errors by signature. For each group give me:
- first and last timestamp
- count
- which URL or PHP file is implicated
- a one-sentence hypothesis about cause
- whether this looks worth investigating

Output as a Markdown report I can paste into our ops Slack.

What I get back is exactly the report a junior sysadmin would write after an hour of log reading, in three minutes. The hypotheses are 70-80% right; the wrong ones are usually wrong in instructive ways (the AI did not know about a specific plugin's known issue).

The crucial step: I read the report and follow up on anything that looks suspicious. The AI surfaces; the human decides what to act on.

Workflow 2: config audits at scale

The "I have not looked at this nginx config in eighteen months and I am sure it has drift" use case.

text
SSH to prod-web. Read /etc/nginx/sites-enabled/*.conf and audit them for:
- Missing security headers (X-Frame-Options, X-Content-Type-Options,
  Referrer-Policy, Strict-Transport-Security, Permissions-Policy).
- HTTP/2 enabled on all SSL listeners.
- Modern TLS only (TLSv1.2 and TLSv1.3, no TLSv1.0/1.1).
- gzip/brotli compression enabled.
- Sensible cache headers on static assets.
- proxy_pass timeouts not set to defaults that mask backend issues.

Output a per-site checklist of what is present and what is missing.
Do not modify anything.

The "do not modify anything" line is non-negotiable. The agent reports; I apply fixes manually after review.

The output is the kind of audit you would pay a consultant to produce. The cost is one prompt; the value is real.

Workflow 3: deploy verification

After every production deploy, a verification sweep:

text
SSH to prod-web. Verify the deploy that completed at 14:00 UTC:
1. Confirm the WordPress core version matches what we deployed
   (wp core version).
2. Confirm all expected plugins are active and at the expected versions
   (wp plugin list).
3. Confirm the home URL responds with 200 and contains the expected
   <title> tag from our latest deploy.
4. Check the last 100 lines of debug.log for any fatal errors.
5. Check nginx access log for any 5xx responses in the last 15 minutes.

If everything looks good, post the summary to our ops Slack channel.
If anything looks wrong, do NOT take corrective action; surface the
issue and stop.

The agent runs the verification, posts the summary, surfaces issues. I do the corrective work if anything is wrong. The "do not take corrective action" guardrail prevents the agent from making cascading bad decisions in an incident scenario.

Workflow 4: security review on running systems

Periodic security sweeps on production hosts:

text
SSH to prod-web. Run a security review:
1. List all users with shell access (/etc/passwd entries with valid shells).
2. List all sudoers (/etc/sudoers and /etc/sudoers.d/*).
3. List listening ports (ss -tlnp).
4. Check fail2ban status and recent bans (fail2ban-client status).
5. Check ufw or iptables rules.
6. List world-writable files in /var/www and /etc (find with -perm -o+w).
7. Check last login times (last -20).
8. Check certbot certificate expiration dates (certbot certificates).
9. Verify ssh config has PermitRootLogin disabled and PasswordAuthentication
   disabled.

Output a structured report. Flag anything unusual.

This is the kind of review I used to do quarterly and now do monthly because it is fast. Catches things like "we added a contractor's user account for a one-week project six months ago and forgot to remove it."

Workflow 5: incident triage

When the site is down or degraded, the agent is the first responder for the read-only investigation:

text
The site is showing intermittent 502 errors per our uptime monitor.

1. SSH to prod-web.
2. Check nginx error log for upstream connection errors in the last 30
   minutes.
3. Check php-fpm status and slow log.
4. Check system load (uptime) and memory pressure (free -m).
5. Check disk space (df -h).
6. Check MySQL slow query log for queries in the last 30 minutes.

Propose a hypothesis for what is happening. Do NOT take any corrective
action. Wait for my confirmation before doing anything beyond reading.

In a real incident, the agent surfaces facts. I make the call about what to act on. The agent is fast at gathering evidence; humans are still better at judgment under pressure.

What to never delegate on infrastructure

A non-exhaustive list:

  • rm -rf of anything, anywhere.
  • DNS record changes without manual verification.
  • Firewall rule changes that could lock you out.
  • SSH config changes on production.
  • Database resets, drops, or imports onto production.
  • Certbot renewals that touch the live SSL chain without staging first.
  • apt upgrade unattended on production. Read the changes first.
  • systemctl stop on services that are user-facing.
  • git push --force to deployment branches.
  • Anything that requires sudo on production outside of explicitly-scoped maintenance windows.

These rules apply regardless of how good the agent is. The asymmetry is: a bad infrastructure change can take a site down for hours; the time saved by automating the change is measured in minutes. The math does not favor the automation.

Where this fits in an agency

For solo operators and small agencies running their own infrastructure, this workflow is the difference between "we hire a part-time sysadmin" and "the developer/CTO can handle it." That is a real cost saving and a real cap on agency size.

For agencies on managed hosting (Kinsta, WP Engine, Pressable), the host handles most of the infrastructure work covered above; the AI-assisted patterns here are less relevant. The trade-off is in The Exact Stack I would Use to Run a Small WordPress Agency Today.

For the broader role-by-role view of where AI fits in agency operations, see How Small WordPress Agencies Can Use AI in 2026, by Role. For the WP-CLI side of these patterns (which compose with infrastructure work for full-stack operations), see Using AI with WP-CLI for Faster WordPress Operations.

The pattern is consistent across every infrastructure workflow: the AI accelerates the read-only investigation; the human stays in the loop on the writes. Stick to that discipline and the throughput gain is real and safe.

Sources

Authoritative references this article was fact-checked against.

TagsWordPressAIInfrastructureSysadminDevOps

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

Using Claude CLI to Manage WordPress Sites

How I use Claude CLI to run WordPress and ACF work end-to-end: ACF field group generation, WP-CLI orchestration, log triage, plugin debugging, bulk content ops. Concrete prompts, what it gets wrong, and where it fits in an agency workflow.

Using AI to Update ACF Fields and WordPress Content

AI plus WP-CLI plus ACF is the canonical pattern for bulk content updates that used to take a careful afternoon. Schema-aware update_field calls, content rewrites at scale, image alt backfills, and the safety patterns that prevent disasters.

Using AI with WP-CLI for Faster WordPress Operations

The WP-CLI patterns that compose well with AI assistants: multi-step plans with checkpoint approval, generated one-off scripts, database surgery, content migrations at scale, and what to never delegate.