TechEarl

AI for WordPress Sysadmins: The Playbook

The sysadmin-role playbook for AI on WordPress infrastructure: log triage, deploy verification, config audits, backup checks, security sweeps, fail2ban rule generation, and the strict read-only discipline that keeps production safe.

Ishan Karunaratne⏱️ 5 min readUpdated
Share thisCopied
Sysadmin-role playbook for AI on WordPress infrastructure: log triage, deploy verification, config audits, backups, security, fail2ban, read-only discipline.

The sysadmin-role playbook for AI on WordPress infrastructure is the quietest of the agency-role playbooks. It does not dazzle in demos the way Figma-to-component does. What it does is take the most time-consuming weekly chores (log triage, config audits, backup verification, security sweeps) and compress them from hours to minutes. That compression is what lets a small agency operate without a full-time sysadmin. Here is the playbook in detail.

Jump to:

The constraint: strict read-only for production AI access

The defining constraint for sysadmin AI work is that the agent should have read-only access to production. Not "mostly read-only" or "sudo with confirmation." Literally a Linux user account on production with read permissions on logs, configs, and process state, and zero write capability.

This is the single most important configuration decision. With it, the AI can investigate anything, produce reports, propose changes, and surface incidents, without any risk of breaking production. Without it, every AI invocation has tail-risk exposure that is not worth the convenience.

bash
# On the production server
useradd -m -s /bin/bash -G adm,systemd-journal claude-reader
# Grant read access to nginx and php-fpm logs (already in adm group)
# No sudo, no docker group, no anything else

The agent SSHes as claude-reader. Any change to production goes through a human-driven path (CI deploy, manual ssh with the operator's account, etc.).

Workflow 1: weekly log triage

The most-used sysadmin workflow. Covered in Using AI to Help Manage WordPress Infrastructure.

The pattern: every Monday morning (or after any significant deploy), the agent pulls and analyzes the last week of nginx, php-fpm, and systemd journal logs, groups errors by signature, and produces a triage report.

Without AI, this is a 90-120 minute weekly chore that gets skipped half the time. With AI, it is a 5-minute prompt and a 3-minute review of the output. The "always do this weekly" discipline becomes possible.

Workflow 2: post-deploy verification

After every production deploy:

text
SSH to prod-web. Verify the deploy that just completed:

1. wp core version on every site in the multisite, confirm matches expected.
2. wp plugin list on every site, confirm no plugins regressed in version.
3. wp option get siteurl on every site, confirm correct URLs.
4. curl -I the home URL of each site, confirm 200 status.
5. Check the last 100 lines of every site's debug.log for fatal errors
   since the deploy timestamp.

Output a checklist. Mark any failures clearly. Post the summary to
ops Slack. Do not take any corrective action.

The agent runs the full sweep across all sites; you read the result; you fix anything that needs fixing yourself.

For multisite installations with 50+ sites, this verification is the kind of thing that did not happen reliably before because nobody had the time. With AI it happens after every deploy.

Workflow 3: monthly config audits

Once a month, sweep through the standard server configs looking for drift:

text
SSH to prod-web. Audit:

1. /etc/nginx/sites-enabled/*.conf for security header completeness
   (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, HSTS,
   Permissions-Policy, X-XSS-Protection).
2. /etc/nginx/sites-enabled/*.conf for TLS configuration (TLS 1.2+
   only, no weak ciphers).
3. /etc/php/8.2/fpm/php.ini for security-relevant settings (expose_php,
   allow_url_fopen, allow_url_include, disable_functions).
4. /etc/ssh/sshd_config for PasswordAuthentication, PermitRootLogin,
   PubkeyAuthentication.
5. /etc/fail2ban/jail.local for active jails and ban times.
6. systemctl list-unit-files --state=enabled for any services that should
   not be enabled (e.g., legacy ftpd).

Output a structured audit report grouped by severity.

The "monthly audit" used to be a "we will get to it" item. Now it runs reliably because the cost is low.

Workflow 4: backup verification

The backups question that haunts every sysadmin: "are the backups actually working AND restorable?"

text
SSH to prod-web. Check our backup status:

1. List the backup files in /backups/ (or wherever they live), with
   sizes and timestamps.
2. Verify the most recent backup is less than 25 hours old.
3. Verify the backup file sizes have not shrunk unexpectedly (if today's
   backup is < 80% of yesterday's, flag it).
4. Verify there are at least 28 daily backups retained.
5. Confirm the offsite sync (rclone, restic, etc.) ran successfully
   today by checking its log.

If anything looks wrong, output a clear summary. If everything is fine,
post a short "backups OK" line to the ops channel.

Separately: produce a one-line command I can run to restore yesterday's
backup to a test server. Do not run the restore.

The "restore command" check is the one that matters most. Backups that are not restorable are not backups. Quarterly, I actually run that command on a test server and verify the restore works end-to-end.

Workflow 5: security sweeps

Monthly security review on production hosts:

text
SSH to prod-web. Security sweep:

1. List users with valid shells (/etc/passwd, looking for /bin/bash etc.).
2. List sudoers (/etc/sudoers + /etc/sudoers.d/*).
3. Check authorized_keys for every shell user; flag any keys we do not
   recognize.
4. List listening sockets (ss -tlnp).
5. Check ufw or iptables rules.
6. find / -perm -o+w -not -path "/proc/*" -not -path "/sys/*" for
   world-writable files.
7. Check certbot certificate expiration (certbot certificates).
8. Check last logins (last -50).
9. Check sudo log (/var/log/auth.log) for sudo invocations in the last
   30 days.

Output a structured report. Flag anything unusual.

The sweep catches: contractor accounts that should have been removed, SSH keys added during a project and forgotten, expired certs about to bite, world-writable files from a botched chown command. Every one of those has caused a real incident on a site I have managed; the monthly sweep catches them before they bite.

Workflow 6: fail2ban rule generation

When a new attack pattern shows up in the access logs, generating a fail2ban filter for it manually takes time. AI is fast at this:

text
I am seeing this pattern in /var/log/nginx/access.log:

[paste 10-20 representative log lines]

Generate a fail2ban filter (regex + jail config) that bans IPs after
5 matches in 10 minutes for 1 hour. Save the filter to /tmp/jail-name.
conf and the jail addition to /tmp/jail-name-jail.conf. Do not deploy
them; I will review and apply manually.

Output is a working fail2ban filter that you review, drop into /etc/fail2ban/filter.d/, add a jail entry to /etc/fail2ban/jail.local, and systemctl reload fail2ban. The whole cycle compresses from 30 minutes to 5.

Workflow 7: incident triage as a first responder

During an outage, the agent is the fast read-only investigator. Covered in detail in Using AI to Help Manage WordPress Infrastructure.

The pattern: you describe the symptom; the agent investigates across nginx, php-fpm, MySQL slow log, system load, disk space, network; the agent proposes a hypothesis; you decide what to act on.

In a real incident the agent saves the 5-10 minutes of "let me ssh in and start looking" cold-start time, which matters when the site is down and customers are watching.

What I will not delegate

  • Anything that requires sudo on production.
  • DNS record changes.
  • Firewall rule changes.
  • SSH config changes.
  • Systemd service stops on user-facing services.
  • Certbot renewals on the live SSL chain.
  • apt upgrade on production.
  • rm of anything in /etc, /var/www, or /home.
  • Backup restoration onto production.
  • Any action during an active incident that has not been explicitly approved.

The pattern is consistent: AI investigates, AI surfaces, AI proposes; humans approve, humans execute, humans bear the consequences. That asymmetry is the right architecture for infrastructure work.

For the developer-role equivalent, see AI for WordPress Developers: The Playbook. For SEO, see AI for WordPress SEO. For the broader role map, see How Small WordPress Agencies Can Use AI in 2026, by Role.

Sources

Authoritative references this article was fact-checked against.

TagsWordPressAISysadminInfrastructureDevOps

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

AI for WordPress SEO: The Playbook

The SEO-role playbook for AI on WordPress sites: internal link audits, schema generation, content-gap analysis, redirect audits, title/meta rewrites, technical SEO triage, and AI-search citability. Plus what stays human.

AI for WordPress Agency Operations: The Playbook

The agency-ops playbook for AI: proposals, SOWs, onboarding documents, SOP creation, meeting summaries, status reports, internal documentation. Where the per-hour gain is highest, and the rules that keep client trust intact.

AI for WordPress Developers: The Playbook

The developer-role playbook for AI on WordPress projects: ACF scaffolding, plugin/theme debugging, WP-CLI orchestration, code review, migration scripts, Figma-to-component, and the senior-review discipline that keeps quality high.