TechEarl

CVE-2026-31431 'Copy Fail': Detect and Patch the Linux Kernel LPE

CVE-2026-31431 (Copy Fail) is a Linux kernel local privilege escalation that turns any unprivileged shell into root on essentially every distribution shipped since 2017. Here is how I check it, patch it, mitigate it pre-patch, and understand how the exploit actually works.

Ishan KarunaratneIshan Karunaratne⏱️ 40 min readUpdated
CVE-2026-31431 (Copy Fail) Linux kernel local privilege escalation: how to detect it, distro-specific kernel update commands, pre-patch mitigation by disabling algif_aead, and the AF_ALG plus splice exploit chain explained.

TL;DR. CVE-2026-31431, named "Copy Fail" by the Theori team that disclosed it on 2026-04-29, is a 9-year-old Linux kernel local privilege escalation. Any unprivileged user with a shell on the box (or in a container) can become root with a 732-byte Python script that uses only the standard library. The window is wide: essentially every mainstream distribution shipped from 2017 through April 2026 is affected. The fix is a single kernel update, available now on every mainstream distro. If you cannot patch immediately, disable the algif_aead module and reload. Then run a detection scan, because the exploit modifies setuid binaries in the page cache without touching disk, so a clean sha256sum on /usr/bin/su does not mean you are clean.

This is the bug you want to handle today, not next Patch Tuesday.

Why "Copy Fail" matters more than the average kernel CVE

Three things make this one stand out:

  1. No race, no offsets. The exploit is logic-based, not memory-corruption-based. There is no spray, no leak-and-overwrite. Taeyang Lee at Theori's Xint Code team reports 100% reliability across the distros they tested.
  2. 732 bytes of Python. The PoC is tiny and reads like a normal Python script using kernel crypto. It does not need ROP, it does not need a kernel info-leak, it does not need a specific kernel build.
  3. Default-on attack surface. AF_ALG is enabled in every stock kernel I have looked at. algif_aead autoloads the moment you ask the kernel to bind a crypto socket. Nothing about a hardened distro turns this off by default.

For a multi-tenant host (CI runners, container nodes, university shells, web hosts), a single user account on the box is an unprivileged shell away from root on the host. For container nodes specifically, the same primitive escapes most containers because the kernel is shared with the host.

How I handled this on my own fleet

The CVE went public on 2026-04-29. I first heard about it on 2026-05-04, five days after disclosure. By the end of that day every Linux server in the fleet I run was patched.

I did not run the upgrades myself. I briefed my sysadmin and they did the work across the cloud fleet, tens of servers across the providers I use. My job that day was triage and prioritization: which boxes are exposed first, which need the modprobe blocklist as a holding measure, which need a reboot scheduled tonight rather than next week. The actual dnf update kernel and reboot loop ran on my sysadmin's terminal, not mine. To be clear, none of these boxes were compromised. The work was preventive: get the patch in before the gap between PoC publication and the fleet update became a real exposure window.

One additional server lives on a managed web host rather than on the cloud fleet. That vendor had already restricted SSH to a small allow-list of source IPs and disabled any default password-based root login. The local-attacker precondition was effectively pre-squeezed there. I confirmed the kernel was on the vendor's normal security-update cadence, sanity-checked their advisory, and moved on.

What kept the rest of the fleet's patch window survivable was a layered defensive posture baked in long before this CVE existed. There are three real paths an attacker uses to get the local shell that Copy Fail needs, and the fleet I run has each one closed:

  • Cloud-firewall IP allow-list (the network layer). Port 22 is open only to a small set of source IPs at the cloud-provider firewall level: security groups on AWS, firewall rules on GCP, NSGs on Azure, equivalent on other providers. An attacker not coming from one of those IPs cannot even reach sshd. The TCP handshake gets dropped at the platform's network edge, well before any auth code runs.
  • SSH keys only, no passwords (the SSH auth layer). For connections that do reach sshd, every server in the fleet has PasswordAuthentication no and ChallengeResponseAuthentication no in /etc/ssh/sshd_config. The only way in is a key I or my sysadmin issued.
  • Web app and dependency hygiene (the application layer). This is the one most teams underweight, and it is the most common way attackers actually get a shell on a Linux server. A remote-code-execution bug, command-injection bug, unauthenticated file upload, server-side template injection, insecure deserialization, or vulnerable dependency in any service you expose to the internet turns "remote attacker" into "unprivileged shell on the box" without ever touching SSH. From there, Copy Fail finishes the job. I spend real effort keeping this surface tight: dependencies updated continuously (not quarterly), no handler shells out with raw user-controlled strings, every app runs as a least-privilege user instead of root, file uploads land in a non-executable directory with explicit MIME-type checks, admin endpoints sit behind the same firewall allow-list as SSH, and application logs are actively monitored rather than just collected. If you ship code on the same boxes that run Linux, this layer is at least as important as the SSH posture.

All three are enforced consistently across the fleet via configuration management, so drift is impossible. One host with PasswordAuthentication yes, one security group with 0.0.0.0/0 on port 22, or one outdated app dependency with a known RCE is the entry point an attacker walks through, and the fleet has none of them.

That layered posture is the thing. Copy Fail needs a local shell to start. The kernel patch closed the bug. The three-layer posture meant the door had not been left open for an attacker to use it in the first place.

If you have not already enforced all three, do it today. They are independent of the kernel patch and they stack with it. Defensively, work outside in:

Step 1: patch the kernel. Cover the bug itself. The per-distro commands are in the next sections.

Step 2: lock SSH down to a source-IP allow-list at the cloud firewall. Concrete example on AWS (for the full SSH-into-EC2 workflow including key generation, see my aws-ssh-into-ec2-instance writeup):

bash
# Replace sg-XXXX with your security group, and 203.0.113.45/32 with the
# office / VPN / bastion IPs you actually use.
aws ec2 revoke-security-group-ingress \
  --group-id sg-XXXX --protocol tcp --port 22 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress \
  --group-id sg-XXXX --protocol tcp --port 22 --cidr 203.0.113.45/32

The IAM permissions you need for that revoke/authorize pair are covered in my aws-iam-policy-examples reference. On GCP use gcloud compute firewall-rules update (my gcp-ssh-into-vm-without-gcloud post walks the GCP SSH path end to end), on Azure use az network nsg rule update. The equivalent panel exists in every cloud console. The principle is the same: never 0.0.0.0/0 on port 22 in a production VPC.

Step 3: force key-only auth and drop password login on every box:

bash
# /etc/ssh/sshd_config
PasswordAuthentication no
ChallengeResponseAuthentication no
PermitRootLogin no
# Optional account-level whitelist on top of key auth:
AllowUsers your-username [other-explicit-usernames]

# After the edit:
sudo sshd -t                 # syntax check before reloading
sudo systemctl reload sshd

Pair that with fail2ban (or your distro's equivalent) to rate-limit anything that does slip past the firewall, and watch /var/log/auth.log for anything that does not match your known set of users and source IPs. My ssh-cheat-sheet covers the full set of hardening flags I lean on (jump hosts, agent forwarding, key types, ProxyCommand) if you want to push this further.

Step 4: harden the web applications and services running on the box. This is the one that gets missed. Even with the kernel patched, SSH locked behind an IP allow-list, and key-only auth, an RCE in your web app gives an attacker an unprivileged shell on the server. From there, until you patch, Copy Fail gives them root. Realistic baseline:

  • Keep dependencies current. Run npm audit / pnpm audit / pip-audit / bundler-audit / cargo audit / composer audit in CI and fail the build on high-severity findings. Wire up Dependabot, Renovate, or your platform equivalent so updates land continuously, not on a quarterly cleanup sprint. Most "web app gave them a shell" incidents trace back to a dependency that had a CVE for six months.
  • No raw user input near a shell or eval. Anything that takes user input gets validated at the boundary. Never construct a shell command, SQL query, template, or eval-ish call from user-controlled strings; use parameterized APIs.
  • Run apps as a least-privilege user. A web app process should never run as root. The exploit chain still works against an unprivileged user (that is the threat model), but everything an attacker does inside the unprivileged shell is bounded by what that user can read and write on the filesystem until Copy Fail (or some equivalent) gives them root.
  • File uploads are dangerous by default. Land them in a directory mounted noexec, validate MIME type and magic bytes, never serve user-uploaded files from a path the web server will execute scripts in.
  • Admin endpoints behind the same firewall allow-list as SSH. WordPress /wp-admin, phpMyAdmin, internal dashboards: same IP allow-list as port 22. The public app surface should be small enough to audit by hand.
  • Watch app logs the same way you watch auth.log. Unusual 500s, repeated 4xx hammering on the same handler, requests to paths that do not exist, OOB callbacks: those are the early signs.

The Copy Fail patch closes the vulnerability. SSH hygiene closes one path to using it. Application hygiene closes the other path, and in practice it is the path that matters more, because the public internet is full of vulnerability scanners and a near-zero fraction of attackers are trying to brute-force your SSH from a non-allow-listed IP.

Am I vulnerable?

Two checks. Both run in seconds, neither requires root for the first one.

bash
# 1. Kernel version
uname -r

# 2. Is the vulnerable module loadable?
modinfo algif_aead 2>/dev/null | head -3

If modinfo returns a filename and a description, the module is on this kernel. The module loads on demand the first time a process opens an AF_ALG socket and asks for an AEAD algorithm, so lsmod | grep algif_aead returning nothing is not reassuring on its own. The question is whether the module is present, not whether it is currently loaded.

NVD scores this CVSS 7.8 (HIGH), vector AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H, CWE-669 (Incorrect Resource Transfer Between Spheres). Red Hat classifies the same flaw as CWE-1288 (Improper Validation of Consistency within Input) and rates it Important rather than Critical "because exploitation requires local access to the system", and explicitly tags it Known exploit in their CVE record. It is also listed in the CISA Known Exploited Vulnerabilities catalog with a federal-agency mitigation deadline of 2026-05-15. Red Hat published a dedicated security bulletin, RHSB-2026-002, with mitigation procedures.

Kernels Theori confirmed the exploit on

These are the kernels Theori successfully exploited end to end. If your uname -r matches one of these (or is older), you are vulnerable.

DistributionVulnerable kernel (Theori PoC-confirmed)
Ubuntu 24.04 LTS6.17.0-1007-aws
Amazon Linux 20236.18.8-9.213.amzn2023
RHEL 10.16.12.0-124.45.1.el10_1
SUSE 166.12.0-160000.9-default

NVD's affected-products entry covers a much wider range: vulnerable kernels include the 5.16–6.1.170 series, 6.2–6.6.137 series, and 6.13–6.18.22 series. The bug entered in 2017, so anything older that is still receiving updates is also exposed.

Verified fixed versions per distribution

Pulled directly from each vendor's official security tracker on 2026-05-18. Where a distro is missing from this table, the patch is in-flight (see the per-distro sections below) or I could not extract data from the vendor's JavaScript-rendered tracker page.

DistributionFixed kernel (x86_64)AdvisoryReleased
Red Hat Enterprise Linux 10.1 (current)kernel-6.12.0-124.55.1.el10_1RHSA-2026:135662026-05-04
RHEL 10.0 EUSkernel-6.12.0-55.71.1.el10_0RHSA-2026:138872026-05-05
NVIDIA for RHEL 10kernel-6.12.0-211.6.el10nvRHSA-2026:149262026-05-07
RHEL 9.7 (current)kernel-5.14.0-611.54.1.el9_7RHSA-2026:135652026-05-04
RHEL 9.6 EUSkernel-5.14.0-570.112.1.el9_6RHSA-2026:14339(per RHSA)
RHEL 9.4 EUSkernel-5.14.0-427.124.1.el9_4RHSA-2026:13932(per RHSA)
RHEL 9.2 E4Skernel-5.14.0-284.169.1.el9_2RHSA-2026:13734(per RHSA)
RHEL 9.0 E4Skernel-5.14.0-70.178.1.el9_0RHSA-2026:13936(per RHSA)
RHEL 8.10 EUS (current 8.x)kernel-4.18.0-553.123.1.el8_10RHSA-2026:135772026-05-05
RHEL 8.10 kernel-rtkernel-rt-4.18.0-553.123.1.rt7.464.el8_10RHSA-2026:135782026-05-05
RHEL 8.8 E4S / TUSkernel-4.18.0-477.139.1.el8_8RHSA-2026:13681(per RHSA)
RHEL 8.6 AUS / E4S / TUSkernel-4.18.0-372.191.1.el8_6RHSA-2026:142302026-05-06
RHEL 8.4 AUS / EUS Long-Lifekernel-4.18.0-305.190.1.el8_4RHSA-2026:141652026-05-06
Debian bullseye (oldstable)5.10.251-5DLA-4560-1(per Debian tracker)
Debian bookworm (stable)6.1.172-1DSA-6243-1(per Debian tracker)
Debian trixie (testing)6.12.88-1DSA-6238-1(per Debian tracker)
Debian sid/forky (unstable)7.0.7-1(no DSA)(per Debian tracker)
SUSE Linux Enterprise 12 SP5 LTSS4.12.14-122.302.1SUSE-SU-2026:21421-12026-05-02
SUSE Linux Enterprise 15 SP4 LTSS5.14.21-150400.24.205.1SUSE-SU-2026:1672-12026-05-02
SUSE Linux Enterprise 15 SP5 LTSS5.14.21-150500.55.149.1SUSE-SU-2026:1670-12026-05-02
SUSE Linux Enterprise 15 SP6 LTSS6.4.0-150600.23.100.1SUSE-SU-2026:1671-12026-05-02
SUSE Linux Enterprise 15 SP76.4.0-150700.53.40.1SUSE-SU-2026:16732026-05-02
SUSE Linux Enterprise 16.06.12.0-160000.29.1(SUSE-SLES-16.0 advisory)2026-05-08
openSUSE Leap 15.6kernel-default >= 6.4.0-150600.23.100.1openSUSE-SLE-15.6-2026-16712026-05-02
openSUSE Leap 16.0kernel-default >= 6.12.0-160000.29.1(Leap-16.0 advisory)2026-05-08
Arch Linuxlinux 6.19.12-1AVG-2908(per Arch tracker)
Amazon Linux 2(see ALAS2-2026-3289 + variant ALAS2KERNEL advisories)ALAS2-2026-32892026-05-05
Amazon Linux 2023 (standard)(see ALAS)ALAS2023-2026-16512026-05-05
Amazon Linux 2023 (kernel6.12)(see ALAS)ALAS2023-2026-16502026-05-05
Amazon Linux 2023 (kernel6.18)(see ALAS)ALAS2023-2026-16492026-05-05
Fedora (all current branches)kernel >= 6.19.12(rolling, per Bugzilla 2460538)(per Bugzilla note)
Alpine Linux edgelinux-lts 6.18.32-r1(routine kernel update)2026-05-18
Alpine Linux v3.21linux-lts 6.12.89-r0(routine kernel update)2026-05-15
Alpine Linux v3.20linux-lts 6.6.139-r0(routine kernel update)2026-05-15
Alpine Linux v3.19linux-lts 6.6.140-r0(routine kernel update)2026-05-17
Alpine Linux v3.18linux-lts 6.1.173-r0(routine kernel update)2026-05-16

Red Hat's full list also includes 11 OpenShift kernel advisories (RHOSE 4.12 through 4.21), 10 follow-up kpatch-patch advisories for live-patching subscribers, and additional kernel-rt builds for RHEL 9.0 / 9.2 / 9.4 / 9.6 / 9.7. The complete machine-readable list is in Red Hat's CSAF VEX file. Rocky Linux and AlmaLinux rebuild from the corresponding RHEL sources and ship the same kernel NVRs (with .rocky/.alma substituted for .el) within 24-72 hours of the matching RHSA.

Ubuntu is conspicuously missing. As of 2026-05-18, all supported Ubuntu releases (24.04 LTS noble, 22.04 LTS jammy, 20.04 LTS focal, 18.04 ESM bionic, 16.04 ESM xenial) are listed as "vulnerable, work in progress" on ubuntu.com/security/CVE-2026-31431. Canonical has shipped USN-8226-1 and USN-8226-2 (2026-04-30) as a temporary mitigation: kmod package updates that blocklist the algif_aead module on boot. No kernel patch yet. If you run Ubuntu, you currently have the modprobe blocklist and nothing else from Canonical. See the mitigation section below.

Fedora is patched. Red Hat Bugzilla 2460538 carries a direct statement from Fedora: "this was fixed in kernel 6.19.12, and all current Fedora branches are already at or past that version." sudo dnf upgrade kernel && sudo reboot puts you on the patched build.

Alpine Linux ships the fix in routine kernel updates across every supported branch (verified against pkgs.alpinelinux.org on 2026-05-18): edge linux-lts 6.18.32-r1, v3.21 6.12.89-r0, v3.20 6.6.139-r0, v3.19 6.6.140-r0, v3.18 6.1.173-r0. All built between 2026-05-15 and 2026-05-18, all on upstream stable branches that contain mainline commit a664bf3d603d. Alpine's secdb does not tag the kernel package with CVE-2026-31431 directly (Alpine treats stable-kernel bumps as a rolling fix-stream rather than per-CVE advisories), but it does explicitly tag docker 29.4.2-r0 and 29.4.3-r0 on edge as fixed for this CVE.

Patch the kernel

The fix reverts a 2017 optimization in algif_aead that allowed in-place operations to land on page-cache pages. Mainline commit a664bf3d603d. Every distro is shipping this as a normal kernel security update, which means you upgrade your kernel package and reboot.

Ubuntu (special case: no kernel fix yet)

Important: as of 2026-05-18, Canonical has NOT shipped a patched kernel for any Ubuntu release. The official tracker at ubuntu.com/security/CVE-2026-31431 lists noble (24.04), jammy (22.04), focal (20.04), bionic (18.04 ESM), and xenial (16.04 ESM) as "vulnerable, work in progress".

What Canonical did ship is USN-8226-1 / USN-8226-2 on 2026-04-30: a kmod package update that blocklists algif_aead from loading. That is a mitigation, not a fix. Apply it now and re-run normal updates later when the patched kernel lands.

bash
sudo apt update
sudo apt install --only-upgrade kmod
# USN-8226 ships a /lib/modprobe.d entry that blocks algif_aead.
# Reboot to make sure no live process is holding the module.
sudo reboot

Cross-check after reboot:

bash
modprobe algif_aead 2>&1   # should fail / be blocked
lsmod | grep algif         # should print nothing

Watch ubuntu.com/security/CVE-2026-31431 and your release's -security pocket. When the kernel USN lands, sudo apt update && sudo apt install --only-upgrade linux-image-generic && sudo reboot finishes the job.

Debian

The tracker confirms fixes in every supported release. Pin or upgrade to at least the fixed source-package version:

bash
sudo apt update
sudo apt install --only-upgrade linux-image-$(dpkg --print-architecture)
sudo reboot

Fixed versions (from security-tracker.debian.org):

ReleaseSource-package versionAdvisory
bullseye (oldstable)5.10.251-5DLA-4560-1
bookworm (stable)6.1.172-1DSA-6243-1
trixie (testing)6.12.88-1DSA-6238-1
sid / forky (unstable)7.0.7-1(no DSA)

If you are on buster or stretch, those releases are out of even ELTS at this point and have no Debian advisory. See the EOL distros section.

RHEL, Rocky Linux, AlmaLinux

bash
sudo dnf update kernel
sudo reboot

Red Hat shipped 33 distinct advisories for CVE-2026-31431 between 2026-05-04 and 2026-05-11, covering every supported RHEL 8, 9, and 10 channel (current, EUS, AUS, E4S, TUS, Long-Life Add-On), kernel-rt, NVIDIA-for-RHEL, OpenShift node images, and kpatch-patch for live-patching subscribers. The Red Hat security bulletin RHSB-2026-002 is the consolidated reference. Fixed kernel NVRs for the current release of each major version:

EUS, AUS, E4S, and Long-Life subscribers: see the full table in the "Verified fixed versions" section above for the exact RHSA covering your minor release. If you cannot reboot, RHEL 8 customers with live-patching can take RHSA-2026:15976 (kpatch-patch, 2026-05-11) for a no-reboot mitigation.

Rocky Linux and AlmaLinux rebuild from RHEL sources. Their advisories track the matching RHSA within 24-72 hours. Same dnf update kernel command applies. Cross-check at errata.rockylinux.org and errata.almalinux.org.

If you are on kernel-ml from ELRepo, wait for the corresponding kernel-ml-6.x.y build or switch back to the vendor kernel until ELRepo catches up.

Amazon Linux 2 and 2023

bash
sudo dnf update kernel
sudo reboot

For AL2023 the security advisories shipped 2026-05-05: ALAS2023-2026-1651 for the default kernel, plus ALAS2023-2026-1650 (kernel6.12 variant) and ALAS2023-2026-1649 (kernel6.18 variant).

For AL2, the same date shipped ALAS2-2026-3289 (core kernel) plus per-extra advisories ALAS2KERNEL-5.4 / 5.10 / 5.15 for the kernel-major extras.

If you cannot reboot immediately, AWS also released livepatches on 2026-05-04 covering nearly every recent kernel build in both AL2 and AL2023 (search ALAS2LIVEPATCH and ALAS2023LIVEPATCH at explore.alas.aws.amazon.com). Enable the livepatch service and the patch is hot-applied; no reboot needed until the next maintenance window. Livepatches are the right answer for fleets where you cannot drain and reboot every EC2 instance today.

SUSE Linux Enterprise and openSUSE

bash
sudo zypper refresh
sudo zypper patch
sudo reboot

SUSE shipped patched kernels on 2026-05-02 (most SLE channels) and 2026-05-08 (SLE 16.0). LTSS customers on SLE 12 SP5 also got coverage in 4.12.14-122.302.1 (SUSE-SU-2026:21421-1). See the fixed-versions table above.

Arch Linux

bash
sudo pacman -Syu linux
sudo reboot

Fixed in linux 6.19.12-1 (AVG-2908). If you run linux-lts, linux-hardened, or linux-zen, check security.archlinux.org/CVE-2026-31431 for the matching package version before updating; Arch publishes the package-specific fix once each variant ships.

Alpine Linux

bash
sudo apk update
sudo apk upgrade linux-lts
sudo reboot

Verified fixed package versions per branch (cross-referenced against pkgs.alpinelinux.org on 2026-05-18):

Branchlinux-ltsBuild date
edge6.18.32-r12026-05-18
v3.216.12.89-r02026-05-15
v3.206.6.139-r02026-05-15
v3.196.6.140-r02026-05-17
v3.186.1.173-r02026-05-16

If you are running linux-virt or linux-edge, swap the package name and check the corresponding column on the Alpine packages site. Alpine ships kernel fixes by bumping to the upstream stable that already contains the patch rather than per-CVE backports, so the secdb does not tag CVE-2026-31431 against the kernel package directly. The versions in the table above are on upstream stable branches that include mainline a664bf3d603d.

If you run Docker on Alpine, also upgrade docker to at least 29.4.2-r0 on edge (Alpine's secdb explicitly tags the docker package for this CVE, likely shipping the same module-blocklist mitigation Canonical pushed via USN-8226).

Container hosts (Docker, Kubernetes nodes, k3s, k0s)

The kernel lives on the host, not in the container. Patch the host kernel and reboot the node, then drain and reschedule workloads. Containers do not need to be rebuilt to inherit the fix, because they share the host kernel. If you cannot reboot nodes (lol), see the next section on the modprobe blocklist.

Cloud-managed Kubernetes

ProviderAction
AWS EKSUpgrade node groups to the latest AMI (the bundled Amazon Linux 2023 kernel is patched in current images).
GKEUpgrade node pools; pick the latest Container-Optimized OS release from the release-notes channel you track.
AKSUpgrade node images to the latest Azure-published version that includes the post-2026-04-22 kernel security update.

A rolling node upgrade is enough. The pods get the patched kernel the next time they land on an upgraded node. Check the provider's release notes for the exact image / AMI ID that includes the fix.

What if I'm on an EOL distro (CentOS 7, RHEL 6, end-of-life Ubuntu, etc.)?

EOL means no vendor security updates. For Copy Fail specifically, that means no kernel patch is coming for you from upstream. The bug is 9 years old, so anything that has ever shipped a 2017-or-later kernel and is now EOL is affected with no upstream fix path. Honest about scope, here is what is real:

EOL status of common candidates (verifiable):

DistributionStatus
CentOS Linux 7EOL since 2024-06-30. No upstream patches.
CentOS Linux 8EOL since 2021-12-31. Long dead.
CentOS Stream 8EOL since 2024-05-31. No upstream patches.
CentOS Stream 9 / 10Active. Patches come via Red Hat upstream; track access.redhat.com/security/cve/CVE-2026-31431.
RHEL 6 / 7EOL from Red Hat normal support. Available only via paid Red Hat ELS (Extended Life-cycle Support).
Ubuntu releases past their ESM windowNo Canonical updates.
Debian buster / stretchOut of LTS / ELTS.

Real options (no invented commands here):

  1. Migrate to a supported RHEL-compatible distro. Rocky Linux, AlmaLinux, and Oracle Linux are free, RHEL-compatible, and actively patched. Each ships an in-place migration tool: migrate2rocky, almalinux-deploy, and the centos2ol.sh script respectively. For CentOS 7 → Rocky 9 or AlmaLinux 9 you're going through a major version bump, which is more of a project than a script run, but it is the long-term right answer.
  2. Buy commercial extended support. Red Hat sells ELS for older RHEL releases, and Canonical sells Ubuntu Pro / ESM. These are real vendor channels that backport CVE patches into otherwise-EOL kernels.
  3. Buy third-party extended support. TuxCare (formerly CloudLinux's enterprise arm) sells Endless Lifecycle Support for CentOS 6 / 7 / 8, Oracle Linux 6, Ubuntu LTS post-ESM, and other dead distros. They ship CVE patches as KernelCare livepatches, applied without reboot. This is the realistic path if you cannot migrate quickly and cannot buy from the original vendor.
  4. Mitigate-only, permanently. Apply the algif_aead modprobe blocklist below and accept that you will never get the kernel patch. This is the right answer for appliances and air-gapped systems where migration is not on the table and where the userspace crypto interface is provably unused. Pair it with strong process-level controls (seccomp profiles that block AF_ALG socket creation in any sandboxed workload).
  5. Rebuild the kernel. For the brave: pull the RHEL or upstream-stable patched source, rebuild with your own kernel signing keys, and ship internally. This is what large orgs running custom kernels do anyway. It is not a small-team option.

If you are on CentOS 7 specifically: the realistic short path is the mitigation (option 4) today, the realistic medium path is TuxCare ELS (option 3) or migration to Rocky / Alma / OL (option 1) within weeks. Do not assume you are safe just because the kernel is "old and unchanged". The bug shipped in 2017; if your kernel is from 2017 or later (which any post-3.10 CentOS 7 kernel is), you are vulnerable.

Cannot patch yet? Disable the module

If reboot scheduling is the bottleneck, you can take algif_aead out of the picture without rebooting:

bash
# Block the module so it cannot autoload on next boot
echo "install algif_aead /bin/false" | sudo tee /etc/modprobe.d/disable-algif-aead.conf

# If the module is already loaded, unload it (only safe if nothing is actively using it)
sudo rmmod algif_aead 2>/dev/null || true

# Verify
sudo lsmod | grep -E "algif_aead|algif_hash" || echo "algif_aead is not loaded"

I have not seen anything in the standard server stack that depends on algif_aead at runtime. The userspace crypto interface is used by a handful of niche tools (some hardware-acceleration paths, a few VPN implementations doing kernel offload). If you run one of those, test in staging before you blocklist on prod. Otherwise, this is a safe mitigation that buys you time until you can do a controlled kernel reboot.

The same approach works for other algif siblings (algif_skcipher, algif_hash). The disclosed exploit chain uses algif_aead specifically, but adding all of algif_* to the blocklist removes the entire kernel-crypto userspace interface and is the move I would make on any box that does not need it.

Verify the patch worked

After the kernel update and reboot, three verifications:

bash
# 1. Running kernel matches the patched version
uname -r

# 2. The patched module reports the post-fix version (Ubuntu/Debian)
modinfo algif_aead | grep ^srcversion

# 3. Run the detection toolkit on the host
git clone https://github.com/ishankaru/CVE-2026-31431-Copy-Fail-Detection-Toolkit
cd CVE-2026-31431-Copy-Fail-Detection-Toolkit
sudo python3 copy_fail_detect.py

The third one is the one that matters. The detection script (originally written by makitos666, maintained in my fork with a few additions) walks setuid binaries, compares disk hashes against page-cache hashes, enumerates active AF_ALG sockets, and pattern-matches recent auth and kernel logs for exploitation indicators. A clean run after the reboot is your signal that nothing landed in the brief window before the patch.

Were you already exploited? Forensic detection

This is the part most CVE writeups skim. The Copy Fail exploit modifies /usr/bin/su (or another setuid target) in the page cache, not on disk. Reboot the box and the modification disappears. Compare on-disk SHA256 of /usr/bin/su and it matches the package, because the package is fine. The compromise is invisible to anything that hashes from disk.

What you actually want to check:

bash
# 1. Compare what is on disk vs. what is in the page cache for setuid binaries
sudo python3 copy_fail_detect.py --setuid-only

# 2. Look for recent AF_ALG socket activity in the audit log
sudo ausearch -k socket_alg -i 2>/dev/null | head

# 3. Grep auth log for unexplained su / sudo successes
sudo grep -iE "session opened for user root|COMMAND=.*\bsu\b" /var/log/auth.log | tail -50

# 4. Look for python processes that imported `socket` and `splice` together recently
sudo journalctl --since "2 weeks ago" | grep -E "audit.*python.*algif|algif_aead" | head

If grep and ausearch are not in your fingertips, my grep-cheat-sheet covers the patterns I reach for in incident response.

The toolkit's YARA rules cover the disclosed PoC and a few common variants. They will not catch a recompiled-in-C version a determined attacker brings, but the disk-vs-page-cache hash check is technique-agnostic. If the cache hash diverges from the disk hash on a setuid binary, that box is compromised regardless of which exploit was used.

If you find a positive: assume host root, rotate every credential the host has ever seen, snapshot disks for forensics, and rebuild the host. Page cache lives in RAM, so a clean reboot from known-good media wipes the modification, but anything the attacker did with their root window (added SSH keys, scheduled jobs, modified systemd units, dropped a persistence binary in /usr/local/bin) survives. The detection toolkit's auth-log correlator helps narrow the time window for a deeper investigation. The same playbook applies to other root compromises: I wrote up the response shape for a different incident in wordpress-malware-removal, and the cleanup steps overlap a lot.

How Copy Fail actually works

Skip this section if you only care about the fix. From here down is for people who want to understand the bug.

AF_ALG: the kernel crypto userspace interface

AF_ALG is a socket family that exposes the kernel's crypto subsystem to userspace. You open an AF_ALG socket, bind it to a named algorithm (e.g. "aead", "gcm(aes)"), and you get a file descriptor you can send plaintext to and read ciphertext from. It was added in 2010 and is used by software that wants kernel-accelerated crypto without dragging in a library. It is enabled by default in every distro kernel I have looked at.

The relevant subtype is algif_aead, which provides Authenticated Encryption with Associated Data. AEAD operations want a single buffer containing the plaintext (which becomes ciphertext on encrypt) plus space for the authentication tag.

The 2017 in-place optimization

In 2017, kernel commit 72548b093ee3 added an optimization to algif_aead: if the source and destination buffers point to the same memory, do the operation in place instead of allocating a fresh destination buffer. This is the bug. The optimization assumed the source buffer was always writable user memory. It did not check.

When the source is built via splice() from a pipe, the resulting scatterlist entries can point at page-cache pages, which are kernel memory backing files on disk. The page cache is normally read-only from this code path. The in-place optimization makes it the destination too, and the AEAD operation then writes the authentication tag into the page cache.

Mainline commit a664bf3d603d reverts the optimization. After the patch, the destination is always a fresh allocation, and page-cache pages never appear in the writable destination scatterlist.

splice, scatterlists, and the 4-byte primitive

The exploit chain:

  1. Open /usr/bin/su read-only. The kernel populates the page cache.
  2. Open an AF_ALG socket bound to authencesn(hmac(sha256),cbc(aes)) (the specific AEAD construction Theori's PoC uses) and set the AEAD authentication-tag size to 4 bytes via ALG_SET_AEAD_AUTHSIZE. That 4-byte tag is the write primitive.
  3. Open a pipe. splice() the file descriptor for /usr/bin/su into the pipe at the byte offset you want to overwrite, which moves the page-cache page reference into the pipe without copying.
  4. splice() the pipe into the AF_ALG socket. The kernel builds a scatterlist for the AEAD operation that includes the page-cache page as a source.
  5. Trigger the AEAD operation. The buggy in-place path promotes the source scatterlist entry to also serve as the destination. The 4-byte authentication tag (computed from attacker-controlled key, IV, and associated data) gets written into the page-cache page at the chosen offset.
  6. The page-cache page is what every future read() of /usr/bin/su returns. The on-disk file is untouched, but every execution of /usr/bin/su from now on reads the modified bytes.
  7. Repeat the 4-byte write to apply a precomputed binary patch (the PoC carries it zlib-compressed in the script and writes it 4 bytes at a time). The patch retargets a near-jump in su so the privilege-drop path becomes a no-op.
  8. system("su") from the attacker's shell. The execve loads the modified page-cache bytes, the privilege drop is skipped, and you have a root shell.

The reliability comes from the fact that none of the steps depend on memory layout or kernel offsets. It is plain syscalls with file descriptors that the unprivileged user already has, and a deterministic 4-byte primitive applied N times.

A conceptual walkthrough

This is the technique class, not the exploit. The disclosed PoC has the specific argument values, the byte-precise overwrite target, and the binary-patching ROP gadget that turns the 4-byte write into root. I am keeping that out and pointing you at the Theori writeup if you need that level of detail.

python
# CONCEPTUAL ONLY. This sketch does not trigger the bug.
# It illustrates the API surface the real exploit walks.
import socket
import os

# 1. The unprivileged file the attacker wants to overwrite via page cache.
victim_path = "/usr/bin/su"

# 2. Populate the page cache. A normal user can read /usr/bin/su.
with open(victim_path, "rb") as f:
    f.read(4096)

# 3. Open an AF_ALG socket for AEAD. This is the userspace crypto interface;
#    no exploit yet, just the standard API.
alg = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0)
alg.bind(("aead", "gcm(aes)"))
# A real exploit would call setsockopt for the key, accept() the op socket,
# and pass an IV via sendmsg ancillary data. Skipped.

# 4. The dangerous part: splice the victim file into a pipe, then into the
#    crypto operation. The scatterlist the kernel builds from the pipe can
#    include page-cache pages as sources, and the buggy in-place path
#    promotes those source pages to destinations.
r, w = os.pipe()
# os.splice(...)  # exact argument shape omitted intentionally
# alg_op.sendmsg(...)  # the trigger call with the precise op flags omitted

# 5. After the AEAD op runs, a few bytes of the page-cache page backing
#    /usr/bin/su are overwritten by the auth tag. Every future read() of
#    /usr/bin/su returns the modified bytes until the page is evicted.
#    The on-disk file is untouched. Reboot clears the modification.

That is the shape. The published Theori PoC (theori-io/copy-fail-CVE-2026-31431) picks the exact 4 bytes to overwrite, carries a zlib-compressed binary patch inline, applies it 4 bytes at a time in a loop, and finishes with system("su") to load the modified page-cache bytes. Taeyang Lee's technical writeup at xint.io walks that part in detail.

I am not reproducing the trigger byte sequence here, because the gap between patched and unpatched in the wild is still measured in weeks, and a copy-paste exploit on a blog post is the kind of friction-removal that lights up unpatched hosts. The Theori writeup is the canonical reference for that and is what they intended to be the canonical reference.

Containers, CI/CD, and multi-tenant hosts

The blast radius is bigger than "one Linux box":

  • Containers. The kernel is shared. An unprivileged process in a Docker, containerd, or Kubernetes container has the same AF_ALG access as a process on the host, unless you have explicitly blocked the family with a seccomp profile (most users have not). Compromising any container compromises the host kernel, which means every other container on that node.
  • CI/CD runners. GitHub Actions self-hosted runners, GitLab runners, Jenkins agents, Buildkite agents: any of these that execute third-party PR code is now a privilege-escalation vector unless the host is patched. The PR runs as an unprivileged user, the exploit promotes that to root on the runner host, the runner host has secrets and the ability to push to internal registries.
  • Multi-tenant hosts. University shells, web hosts, shared bastions, anything where multiple humans have SSH access to the same box. Any of them can escalate to root and read every other tenant's data.
  • Cloud sandboxes. Anywhere that takes user-supplied code and runs it in a "trusted" Linux process (data-science notebook hosts, code-execution APIs, model-sandbox endpoints). Same kernel, same primitive.

If you run any of these, the patch is urgent enough to take a maintenance window for. If you cannot, blocklist algif_aead on every node in the fleet now and reboot on your normal cadence after.

Patch timeline

Dates marked from copy.fail's disclosure timeline and the individual vendor security trackers. Anything else came from my fetches on 2026-05-18.

DateEvent
2017Kernel commit 72548b093ee3 introduces the algif_aead in-place optimization. The vulnerability is born.
2026-03-23Theori reports the bug to the Linux kernel security team.
2026-03-24Initial acknowledgment from the kernel security team.
2026-03-25Patches proposed and reviewed.
2026-04-01Mainline commit a664bf3d603d lands in Linus's tree.
2026-04-22CVE-2026-31431 assigned. NVD publishes (CVSS 7.8 HIGH, CWE-669).
2026-04-29Theori publishes the writeup at copy.fail and the PoC.
2026-04-30Canonical ships USN-8226-1 / USN-8226-2 for Ubuntu: kmod-based modprobe blocklist as a temporary mitigation. Kernel fix still in progress.
2026-05-02SUSE ships patched kernels for SLE 12 SP5 LTSS, SLE 15 SP4/SP5/SP6/SP7 LTSS, and openSUSE Leap 15.6.
2026-05-04AWS ships livepatches for Amazon Linux 2 and 2023 (no-reboot mitigation).
2026-05-05AWS ships standard kernel updates for AL2 (ALAS2-2026-3289) and AL2023 (ALAS2023-2026-1651).
2026-05-08SUSE ships the kernel fix for SLE 16.0.
2026-05-15CISA Known Exploited Vulnerabilities catalog mitigation deadline.
2026-05-18Article last fact-checked against vendor trackers. Ubuntu kernel fix still listed as "work in progress" on every supported release.

If your fleet has hosts that were running between 2026-04-29 and your patch date, assume those hosts were exposed and run the detection toolkit on them. The CVE was assigned a week before the PoC was published, but the PoC is the moment exploitation became turnkey.

References

Primary sources (Theori's coordinated disclosure):

Kernel patches:

  • Mainline fix: commit a664bf3d603d: reverts the in-place optimization in algif_aead.
  • Vulnerable commit (2017): 72548b093ee3.

CVE record:

Vendor security trackers (verified live on 2026-05-18):

Detection toolkit:

EOL distro extended support:

If you find a false positive in the detector or have a distro the patch table does not cover, open an issue on the GitHub repo.

TagsCVELinuxKernelPrivilege EscalationCopy FailLPEalgif_aeadAF_ALGDevOpsContainersTheori
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years across software, Linux systems, DevOps, and infrastructure — and a more recent focus on AI. Currently Chief Technology Officer at a tech startup in the healthcare space.

Keep reading

Related posts

AWS S3 CLI cheat sheet: aws s3 cp local-to-S3, S3-to-local, S3-to-S3 cross-region; aws s3 sync incremental with --delete; --exclude and --include patterns; --storage-class STANDARD_IA / INTELLIGENT_TIERING / GLACIER; --sse AES256 and --sse aws:kms; --acl bucket-owner-full-control; --dryrun for safety; concurrency tuning with max_concurrent_requests and multipart_chunksize; the trailing-slash gotcha that ruins half of all aws s3 cp invocations.

AWS S3 cp and sync Cheat Sheet: Copy, Move, and Sync Files with the CLI

A scannable AWS S3 CLI reference: aws s3 cp, sync, mv, rm, ls; recursive uploads and downloads; --exclude / --include filters; storage classes (STANDARD_IA, GLACIER, INTELLIGENT_TIERING); SSE encryption (AES256, aws:kms); --dryrun safety; the trailing-slash gotcha; concurrency tuning via max_concurrent_requests and multipart_chunksize; cross-account profiles.