The sysadmin-role playbook for AI on WordPress infrastructure is the quietest of the agency-role playbooks. It does not dazzle in demos the way Figma-to-component does. What it does is take the most time-consuming weekly chores (log triage, config audits, backup verification, security sweeps) and compress them from hours to minutes. That compression is what lets a small agency operate without a full-time sysadmin. Here is the playbook in detail.
Jump to:
- The constraint: strict read-only for production AI access
- Workflow 1: weekly log triage
- Workflow 2: post-deploy verification
- Workflow 3: monthly config audits
- Workflow 4: backup verification
- Workflow 5: security sweeps
- Workflow 6: fail2ban rule generation
- Workflow 7: incident triage as a first responder
- What I will not delegate
The constraint: strict read-only for production AI access
The defining constraint for sysadmin AI work is that the agent should have read-only access to production. Not "mostly read-only" or "sudo with confirmation." Literally a Linux user account on production with read permissions on logs, configs, and process state, and zero write capability.
This is the single most important configuration decision. With it, the AI can investigate anything, produce reports, propose changes, and surface incidents, without any risk of breaking production. Without it, every AI invocation has tail-risk exposure that is not worth the convenience.
# On the production server
useradd -m -s /bin/bash -G adm,systemd-journal claude-reader
# Grant read access to nginx and php-fpm logs (already in adm group)
# No sudo, no docker group, no anything elseThe agent SSHes as claude-reader. Any change to production goes through a human-driven path (CI deploy, manual ssh with the operator's account, etc.).
Workflow 1: weekly log triage
The most-used sysadmin workflow. Covered in Using AI to Help Manage WordPress Infrastructure.
The pattern: every Monday morning (or after any significant deploy), the agent pulls and analyzes the last week of nginx, php-fpm, and systemd journal logs, groups errors by signature, and produces a triage report.
Without AI, this is a 90-120 minute weekly chore that gets skipped half the time. With AI, it is a 5-minute prompt and a 3-minute review of the output. The "always do this weekly" discipline becomes possible.
Workflow 2: post-deploy verification
After every production deploy:
SSH to prod-web. Verify the deploy that just completed:
1. wp core version on every site in the multisite, confirm matches expected.
2. wp plugin list on every site, confirm no plugins regressed in version.
3. wp option get siteurl on every site, confirm correct URLs.
4. curl -I the home URL of each site, confirm 200 status.
5. Check the last 100 lines of every site's debug.log for fatal errors
since the deploy timestamp.
Output a checklist. Mark any failures clearly. Post the summary to
ops Slack. Do not take any corrective action.The agent runs the full sweep across all sites; you read the result; you fix anything that needs fixing yourself.
For multisite installations with 50+ sites, this verification is the kind of thing that did not happen reliably before because nobody had the time. With AI it happens after every deploy.
Workflow 3: monthly config audits
Once a month, sweep through the standard server configs looking for drift:
SSH to prod-web. Audit:
1. /etc/nginx/sites-enabled/*.conf for security header completeness
(X-Frame-Options, X-Content-Type-Options, Referrer-Policy, HSTS,
Permissions-Policy, X-XSS-Protection).
2. /etc/nginx/sites-enabled/*.conf for TLS configuration (TLS 1.2+
only, no weak ciphers).
3. /etc/php/8.2/fpm/php.ini for security-relevant settings (expose_php,
allow_url_fopen, allow_url_include, disable_functions).
4. /etc/ssh/sshd_config for PasswordAuthentication, PermitRootLogin,
PubkeyAuthentication.
5. /etc/fail2ban/jail.local for active jails and ban times.
6. systemctl list-unit-files --state=enabled for any services that should
not be enabled (e.g., legacy ftpd).
Output a structured audit report grouped by severity.The "monthly audit" used to be a "we will get to it" item. Now it runs reliably because the cost is low.
Workflow 4: backup verification
The backups question that haunts every sysadmin: "are the backups actually working AND restorable?"
SSH to prod-web. Check our backup status:
1. List the backup files in /backups/ (or wherever they live), with
sizes and timestamps.
2. Verify the most recent backup is less than 25 hours old.
3. Verify the backup file sizes have not shrunk unexpectedly (if today's
backup is < 80% of yesterday's, flag it).
4. Verify there are at least 28 daily backups retained.
5. Confirm the offsite sync (rclone, restic, etc.) ran successfully
today by checking its log.
If anything looks wrong, output a clear summary. If everything is fine,
post a short "backups OK" line to the ops channel.
Separately: produce a one-line command I can run to restore yesterday's
backup to a test server. Do not run the restore.The "restore command" check is the one that matters most. Backups that are not restorable are not backups. Quarterly, I actually run that command on a test server and verify the restore works end-to-end.
Workflow 5: security sweeps
Monthly security review on production hosts:
SSH to prod-web. Security sweep:
1. List users with valid shells (/etc/passwd, looking for /bin/bash etc.).
2. List sudoers (/etc/sudoers + /etc/sudoers.d/*).
3. Check authorized_keys for every shell user; flag any keys we do not
recognize.
4. List listening sockets (ss -tlnp).
5. Check ufw or iptables rules.
6. find / -perm -o+w -not -path "/proc/*" -not -path "/sys/*" for
world-writable files.
7. Check certbot certificate expiration (certbot certificates).
8. Check last logins (last -50).
9. Check sudo log (/var/log/auth.log) for sudo invocations in the last
30 days.
Output a structured report. Flag anything unusual.The sweep catches: contractor accounts that should have been removed, SSH keys added during a project and forgotten, expired certs about to bite, world-writable files from a botched chown command. Every one of those has caused a real incident on a site I have managed; the monthly sweep catches them before they bite.
Workflow 6: fail2ban rule generation
When a new attack pattern shows up in the access logs, generating a fail2ban filter for it manually takes time. AI is fast at this:
I am seeing this pattern in /var/log/nginx/access.log:
[paste 10-20 representative log lines]
Generate a fail2ban filter (regex + jail config) that bans IPs after
5 matches in 10 minutes for 1 hour. Save the filter to /tmp/jail-name.
conf and the jail addition to /tmp/jail-name-jail.conf. Do not deploy
them; I will review and apply manually.Output is a working fail2ban filter that you review, drop into /etc/fail2ban/filter.d/, add a jail entry to /etc/fail2ban/jail.local, and systemctl reload fail2ban. The whole cycle compresses from 30 minutes to 5.
Workflow 7: incident triage as a first responder
During an outage, the agent is the fast read-only investigator. Covered in detail in Using AI to Help Manage WordPress Infrastructure.
The pattern: you describe the symptom; the agent investigates across nginx, php-fpm, MySQL slow log, system load, disk space, network; the agent proposes a hypothesis; you decide what to act on.
In a real incident the agent saves the 5-10 minutes of "let me ssh in and start looking" cold-start time, which matters when the site is down and customers are watching.
What I will not delegate
- Anything that requires sudo on production.
- DNS record changes.
- Firewall rule changes.
- SSH config changes.
- Systemd service stops on user-facing services.
- Certbot renewals on the live SSL chain.
apt upgradeon production.rmof anything in /etc, /var/www, or /home.- Backup restoration onto production.
- Any action during an active incident that has not been explicitly approved.
The pattern is consistent: AI investigates, AI surfaces, AI proposes; humans approve, humans execute, humans bear the consequences. That asymmetry is the right architecture for infrastructure work.
For the developer-role equivalent, see AI for WordPress Developers: The Playbook. For SEO, see AI for WordPress SEO. For the broader role map, see How Small WordPress Agencies Can Use AI in 2026, by Role.
Sources
Authoritative references this article was fact-checked against.
- Nginx documentation (Nginx)nginx.org
- Fail2ban (official wiki)fail2ban.org
- systemd (project documentation)systemd.io





