AI plus SSH plus the standard server toolkit (systemd, nginx, fail2ban, certbot, journalctl) accelerates WordPress infrastructure work in the same way it accelerates application development. Log triage that took twenty minutes takes three. Config audits that I used to skip because they were too time-consuming now run weekly. Deploy verification is automated where it used to be manual. The throughput lift is real. So is the list of things I will not delegate. Here is the canonical infrastructure workflow.
Jump to:
- The setup: SSH + the agent + your existing tooling
- Workflow 1: structured log triage
- Workflow 2: config audits at scale
- Workflow 3: deploy verification
- Workflow 4: security review on running systems
- Workflow 5: incident triage
- What to never delegate on infrastructure
- Where this fits in an agency
The setup: SSH + the agent + your existing tooling
Nothing exotic. The agent (Claude Code, in my case) runs in your local terminal. It can shell out to ssh user@host for any host in your ~/.ssh/config. From there it runs the same commands you would run: journalctl, systemctl, nginx -t, tail, grep, wp-cli, df, htop, etc.
The key constraint: limit what the SSH user can do. The agent inherits the privileges of whatever account it SSHes as. I run an unprivileged user with no sudo for AI-driven operations on production. For tasks that need elevated privileges, I run them manually after the agent surfaces the intent.
# ~/.ssh/config
Host prod-web
HostName 198.51.100.10
User read-only # an account with read access to logs and configs, no sudo
IdentityFile ~/.ssh/prod-readonly-ed25519The agent SSHes as read-only. It can read everything; it cannot break anything.
Workflow 1: structured log triage
The single most-used workflow. Once a week (or after any deploy), I run:
SSH to prod-web. Look at:
- the last 1000 lines of /var/log/nginx/access.log
- the last 500 lines of /var/log/nginx/error.log
- the last 500 lines of /var/log/php8.2-fpm.log
- journalctl -u nginx --since "24 hours ago" --no-pager
- journalctl -u php8.2-fpm --since "24 hours ago" --no-pager
Group errors by signature. For each group give me:
- first and last timestamp
- count
- which URL or PHP file is implicated
- a one-sentence hypothesis about cause
- whether this looks worth investigating
Output as a Markdown report I can paste into our ops Slack.What I get back is exactly the report a junior sysadmin would write after an hour of log reading, in three minutes. The hypotheses are 70-80% right; the wrong ones are usually wrong in instructive ways (the AI did not know about a specific plugin's known issue).
The crucial step: I read the report and follow up on anything that looks suspicious. The AI surfaces; the human decides what to act on.
Workflow 2: config audits at scale
The "I have not looked at this nginx config in eighteen months and I am sure it has drift" use case.
SSH to prod-web. Read /etc/nginx/sites-enabled/*.conf and audit them for:
- Missing security headers (X-Frame-Options, X-Content-Type-Options,
Referrer-Policy, Strict-Transport-Security, Permissions-Policy).
- HTTP/2 enabled on all SSL listeners.
- Modern TLS only (TLSv1.2 and TLSv1.3, no TLSv1.0/1.1).
- gzip/brotli compression enabled.
- Sensible cache headers on static assets.
- proxy_pass timeouts not set to defaults that mask backend issues.
Output a per-site checklist of what is present and what is missing.
Do not modify anything.The "do not modify anything" line is non-negotiable. The agent reports; I apply fixes manually after review.
The output is the kind of audit you would pay a consultant to produce. The cost is one prompt; the value is real.
Workflow 3: deploy verification
After every production deploy, a verification sweep:
SSH to prod-web. Verify the deploy that completed at 14:00 UTC:
1. Confirm the WordPress core version matches what we deployed
(wp core version).
2. Confirm all expected plugins are active and at the expected versions
(wp plugin list).
3. Confirm the home URL responds with 200 and contains the expected
<title> tag from our latest deploy.
4. Check the last 100 lines of debug.log for any fatal errors.
5. Check nginx access log for any 5xx responses in the last 15 minutes.
If everything looks good, post the summary to our ops Slack channel.
If anything looks wrong, do NOT take corrective action; surface the
issue and stop.The agent runs the verification, posts the summary, surfaces issues. I do the corrective work if anything is wrong. The "do not take corrective action" guardrail prevents the agent from making cascading bad decisions in an incident scenario.
Workflow 4: security review on running systems
Periodic security sweeps on production hosts:
SSH to prod-web. Run a security review:
1. List all users with shell access (/etc/passwd entries with valid shells).
2. List all sudoers (/etc/sudoers and /etc/sudoers.d/*).
3. List listening ports (ss -tlnp).
4. Check fail2ban status and recent bans (fail2ban-client status).
5. Check ufw or iptables rules.
6. List world-writable files in /var/www and /etc (find with -perm -o+w).
7. Check last login times (last -20).
8. Check certbot certificate expiration dates (certbot certificates).
9. Verify ssh config has PermitRootLogin disabled and PasswordAuthentication
disabled.
Output a structured report. Flag anything unusual.This is the kind of review I used to do quarterly and now do monthly because it is fast. Catches things like "we added a contractor's user account for a one-week project six months ago and forgot to remove it."
Workflow 5: incident triage
When the site is down or degraded, the agent is the first responder for the read-only investigation:
The site is showing intermittent 502 errors per our uptime monitor.
1. SSH to prod-web.
2. Check nginx error log for upstream connection errors in the last 30
minutes.
3. Check php-fpm status and slow log.
4. Check system load (uptime) and memory pressure (free -m).
5. Check disk space (df -h).
6. Check MySQL slow query log for queries in the last 30 minutes.
Propose a hypothesis for what is happening. Do NOT take any corrective
action. Wait for my confirmation before doing anything beyond reading.In a real incident, the agent surfaces facts. I make the call about what to act on. The agent is fast at gathering evidence; humans are still better at judgment under pressure.
What to never delegate on infrastructure
A non-exhaustive list:
rm -rfof anything, anywhere.- DNS record changes without manual verification.
- Firewall rule changes that could lock you out.
- SSH config changes on production.
- Database resets, drops, or imports onto production.
- Certbot renewals that touch the live SSL chain without staging first.
apt upgradeunattended on production. Read the changes first.systemctl stopon services that are user-facing.git push --forceto deployment branches.- Anything that requires sudo on production outside of explicitly-scoped maintenance windows.
These rules apply regardless of how good the agent is. The asymmetry is: a bad infrastructure change can take a site down for hours; the time saved by automating the change is measured in minutes. The math does not favor the automation.
Where this fits in an agency
For solo operators and small agencies running their own infrastructure, this workflow is the difference between "we hire a part-time sysadmin" and "the developer/CTO can handle it." That is a real cost saving and a real cap on agency size.
For agencies on managed hosting (Kinsta, WP Engine, Pressable), the host handles most of the infrastructure work covered above; the AI-assisted patterns here are less relevant. The trade-off is in The Exact Stack I would Use to Run a Small WordPress Agency Today.
For the broader role-by-role view of where AI fits in agency operations, see How Small WordPress Agencies Can Use AI in 2026, by Role. For the WP-CLI side of these patterns (which compose with infrastructure work for full-stack operations), see Using AI with WP-CLI for Faster WordPress Operations.
The pattern is consistent across every infrastructure workflow: the AI accelerates the read-only investigation; the human stays in the loop on the writes. Stick to that discipline and the throughput gain is real and safe.
Sources
Authoritative references this article was fact-checked against.
- Nginx documentation (Nginx)nginx.org
- Certbot (Electronic Frontier Foundation)certbot.eff.org
- Fail2ban (official wiki)fail2ban.org





