TechEarl

Argument Injection: Why escapeshellarg Is Not Enough

Ishan Karunaratne⏱️ 14 min readUpdated
Share thisCopied
Argument injection past escapeshellarg via dig -f

Argument injection is the variant of command injection that survives the fix most people reach for. The application correctly wraps the user value with escapeshellarg, the shell sees one well-quoted argument, and no semicolons or backticks get a chance to break out. The bug is one layer further in: the called binary parses its own arguments, sees a leading dash, and treats the attacker's value as a flag. The shell never noticed anything was wrong, because nothing was wrong at the shell layer.

This is the variant deep dive that sits under the remote code execution practitioner guide and alongside OS command injection. I cover what argument injection actually is, why escapeshellarg and escapeshellcmd do not stop it, the catalogue of common Unix tools where this is the realistic attack, a walkthrough against the rce-basic lab's /lookup.php, the three-layer fix, and the CVE pattern across git, phpmailer, imagemagick, and friends.

TL;DR

Argument injection happens when a user value, correctly shell-quoted, still reaches a binary that parses leading dashes as flags. Classic command injection inserts a whole new command through shell metacharacters (;, |, $()). Argument injection does not break out of the command at all: it abuses the same command's own option parser. escapeshellarg('-f /etc/passwd') produces '-f /etc/passwd', which is one valid shell-quoted argument, but dig looks at the leading - and reads the rest as the -f batch-file flag, opening any file www-data can read. escapeshellcmd is worse: it does not even touch dashes because dashes are not shell metacharacters. The catalogue of vulnerable tools is wide: dig -f, curl -K/-o, find -exec, tar -T, git --upload-pack, ssh -oProxyCommand, wget --use-askpass. The fix is three layers deep: stop calling shells (use argv arrays), pass -- before the user argument so the binary stops parsing flags, and validate every value against a strict allowlist before any of that.

Command injection vs argument injection

These two get conflated in writeups because they share an outcome (the attacker runs something they should not). The mechanism is different and the fix is different, so it is worth pinning the distinction.

OS command injection inserts a new command into the shell's parse tree. The application builds a single string and hands it to /bin/sh -c. The shell tokenises that string, sees a metacharacter (;, |, &&, backticks, $(), newline), and runs the attacker's command alongside the intended one. The break-out happens at the shell layer, before the target binary is invoked. The full deep dive on this is in OS command injection.

Argument injection does not insert a new command at all. The original command is the only one that runs. The attacker's value is one argument to that command, correctly delimited, correctly shell-quoted. What the attacker controls is what that argument means to the binary's own option parser. A leading dash flips the value from "the hostname to look up" to "the flag that tells dig to read queries from a file".

The two are siblings under CWE-88 (argument injection) and CWE-78 (OS command injection). They look almost identical at the call site. They feel completely different once you start exploiting them.

Why escapeshellarg does not stop this

escapeshellarg is built to do exactly one job: take a string, wrap it so the shell parses it as a single argument no matter what is inside it. It single-quotes the value and escapes any embedded single quotes. That is correct, and that is all it does.

php
escapeshellarg('-f /etc/passwd');
// produces: '-f /etc/passwd'

The result is one well-formed shell-quoted argument. /bin/sh -c "dig '-f /etc/passwd'" tokenises into two argv elements that get passed to execve: dig and -f /etc/passwd. The shell did its job perfectly. escapeshellarg did its job perfectly. The bug is that dig is now sitting with argv[1] = "-f /etc/passwd" and its own option parser is about to look at it.

escapeshellcmd is even less help, and I have seen developers reach for it thinking it is the stronger version. From the PHP manual: "any character that might be used to trick a shell command into executing arbitrary commands is escaped." That list is shell metacharacters: &, ;, `, backslash, pipe, quotes, and a few others. A leading dash is not a shell metacharacter, because the shell does not parse dashes specially. escapeshellcmd('-f /etc/passwd') returns -f /etc/passwd unchanged. The string then gets concatenated into the command and handed to the shell, the shell tokenises on whitespace, and dig ends up with -f and /etc/passwd as two separate flags pointing at the same file.

Neither function is broken. They are both doing what they are documented to do. The mistake is expecting either of them to know what the binary on the other end will do with its arguments.

The catalogue of tools where this is the realistic attack

Argument injection is only useful when the target binary has a flag that does something interesting. Almost every classic Unix tool has at least one. The ones I have actually exploited in practice or seen in published advisories:

  • dig -f <file> reads DNS queries from a file. Pointed at /etc/passwd or any readable file, it tries to parse each line as a hostname, fails, and prints the offending line as part of the error message. Net effect: file read. This is the lab's example below.
  • curl -K <configfile> reads a curl config file. The config can specify url = ... (request anywhere, hello SSRF), output = ... (write the response anywhere www-data can write, including over PHP files in the web root), header = ..., user-agent = .... One flag, an entire scripting language.
  • curl -o <path> writes the response body to an attacker-named file. Combined with a controlled URL, that is arbitrary file write.
  • find <path> -exec <cmd> {} \; runs an arbitrary command for each match. If the application calls find with a user-supplied "search path" that the attacker starts with -exec id ;, find treats that as the action expression and runs id. Full RCE.
  • tar -T <file> reads filenames to archive from a file. Combined with --checkpoint-action=exec=... in GNU tar, that is full command execution. The standalone --checkpoint-action is itself a famous primitive.
  • git fetch --upload-pack=<cmd> and git clone --upload-pack=<cmd>. Older Git let an attacker-controlled upload-pack value run a command on the client. The submodule URL variant (CVE-2018-17456) abused the same class through a .gitmodules URL starting with -. The class is generic even though the specific bugs are patched.
  • ssh -o ProxyCommand=<cmd> is the canonical OpenSSH argument-injection primitive. Any application that builds an ssh command line with a user-controlled hostname is one -oProxyCommand= away from RCE.
  • wget --use-askpass=<cmd> runs the named program to prompt for a password and uses its output. Attacker-controlled, full execution.
  • wget --execute='set ...' lets the attacker drop arbitrary wgetrc directives, including output_document for file write.
  • rsync -e <cmd> sets the remote-shell command; attacker-controlled, full execution.
  • zip --unzip-command=<cmd> and a handful of other archive tools have analogous escape hatches.
  • mysql --defaults-file=<path> loads options from a file the attacker can name, and the loaded options include init-command for execution.

The common shape is "a flag that points at an external resource or a sub-program". Any binary with one of those is an argument-injection sink when a shell-quoted string starting with a dash reaches it.

Walking the lab

The rce-basic lab in the techearl-labs repo ships the exact sink: shell_exec('dig ' . escapeshellarg($domain)) at /lookup.php. Boot it:

bash
docker compose up rce-basic

It listens on http://localhost:8085. The exploit is one request with a URL-encoded space:

code
GET /lookup.php?domain=-f%20/etc/passwd

escapeshellarg wraps the value, so what the shell tokenises is one argument: -f /etc/passwd. The shell hands dig that single argv element. dig looks at the leading -, parses -f as the batch-file flag, opens /etc/passwd, and tries to read each line as a DNS query. Every line fails to parse as a hostname, so dig emits a parse error for each one, and the error output contains the offending line: root:x:0:0:root:/root:/bin/bash, daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin, and so on. The response body contains the contents of /etc/passwd.

Any file readable by uid 33 (www-data) is exfiltrable through the same path. /etc/hosts, /proc/self/environ, /var/log/apache2/access.log, application source files, .env files that someone forgot to put outside the docroot. The bug is the file read, the impact is whatever happens to be on disk.

The ;id payloads that work against /ping.php do not work here. escapeshellarg('localhost;id') produces 'localhost;id', which dig receives as one argument and treats as a malformed hostname. The metacharacter defence works. The flag defence is what failed.

The fix in three layers

There is no single line of code that closes the class. The reliable pattern composes three defences, each one a real layer rather than a paranoid duplicate of the one before it.

1. Use argv arrays, never shells

The deep version of this is in the OS command injection article. The summary is that every modern language exposes a "run a binary directly via execve" API that takes the command and its arguments as separate values. No shell is invoked, no string concatenation happens at any layer, and the binary receives exactly the arguments you passed.

php
$process = proc_open(
    ['dig', '+short', $domain],
    [1 => ['pipe', 'w'], 2 => ['pipe', 'w']],
    $pipes
);
python
result = subprocess.run(
    ['dig', '+short', host],
    capture_output=True, text=True, timeout=5,
)
javascript
const { execFile } = require('child_process');
execFile('dig', ['+short', host], (err, stdout) => { /* ... */ });
go
out, err := exec.Command("dig", "+short", host).Output()

Argv arrays alone do not close argument injection. The shell is out of the picture, but the binary still parses -f as a flag whether the argument arrived through a shell or through execve. The argv form is a prerequisite, not the whole fix.

2. Pass -- before user arguments

Most well-behaved Unix tools accept -- as an end-of-options marker. Anything after -- is treated as a positional argument even if it starts with a dash. The pattern is:

php
$process = proc_open(
    ['dig', '+short', '--', $domain],
    [1 => ['pipe', 'w'], 2 => ['pipe', 'w']],
    $pipes
);
python
result = subprocess.run(['dig', '+short', '--', host], ...)

With -- in place, dig reads -f /etc/passwd as a literal positional argument, fails to parse it as a hostname, and returns an error. No file read.

Two caveats. First, not every tool honours --. Some legacy tools, some tools with non-getopt parsers, and (notably) older versions of certain GNU tools either ignore the marker or treat it inconsistently. Check the manual for the specific tool before relying on this. Second, even tools that honour -- may still have flags that take a parameter starting with =, like --upload-pack=foo, which sit before any positional and are not gated by --. The marker protects positionals, not flag values.

3. Validate against a strict allowlist before shelling out

The first two layers prevent the shell and the binary from misinterpreting the value. The third layer prevents the value from getting that far if it has no business being there. Every user-supplied field has a legal character set; refuse anything outside it.

php
if (!preg_match('/^[a-zA-Z0-9.-]+$/', $domain)) {
    http_response_code(400);
    exit('invalid hostname');
}
python
import re
if not re.fullmatch(r"[a-zA-Z0-9.-]+", host):
    raise ValueError("invalid hostname")

A hostname allowlist is roughly [a-zA-Z0-9.-]+. A leading dash is not in the set, so -f /etc/passwd is rejected at the validator before any subprocess call happens. A filename allowlist would be different (no .., no leading /, no ~). A UUID allowlist would be tighter still. The point is that the validator knows the field's vocabulary and refuses everything else.

Allowlists beat blocklists every time. A blocklist of "dashes, semicolons, backticks" misses tabs, newlines, and the next character the attacker tries. An allowlist of "the characters this field is allowed to contain" does not need to anticipate the attacker, because it only says yes to known-good values.

Layered together

python
import re, subprocess

def lookup(host: str) -> str:
    if not re.fullmatch(r"[a-zA-Z0-9.-]+", host):
        raise ValueError("invalid hostname")
    result = subprocess.run(
        ["dig", "+short", "--", host],
        capture_output=True, text=True, timeout=5,
    )
    return result.stdout

Three lines of defence: validator refuses anything that is not a hostname, argv array means no shell, -- means dig will not parse a positional as a flag even if the validator is somehow bypassed. Each layer is independent. None of them is the "real" fix; together they close the class.

The strongest version of the fix is the one I keep landing on in code review: do not shell out at all. PHP has dns_get_record. Python has dns.resolver. Node has dns.resolve. Go has net.LookupHost. The library version has no shell, no argv parser, no flag handling, and no argument-injection surface. If the work can be done with a library call, the library call is the right answer.

Real-world incidents

A short tour of argument-injection-shaped CVEs. The version-specific details live in the linked NVD entries; the lessons below are what I want to remember.

  • PHPMailer, CVE-2016-10033 (December 2016). PHPMailer before 5.2.18 passed the sender address as the -f argument to sendmail without sanitising it. An attacker who controlled the From address on a contact form could inject additional sendmail flags, notably -X (write a debug log to an attacker-named path), turning the form into an arbitrary file write into the web root. With a .php extension on the write path, that escalates to RCE. The class was the same as the lab's dig -f issue: a value that reached an external binary's option parser through a path that had not anticipated dashes.
  • Git submodule URL, CVE-2018-17456 (October 2018). Git before 2.14.5, 2.15.3, 2.16.5, 2.17.2, 2.18.1, and 2.19.1 allowed a .gitmodules file to specify a submodule URL that started with a dash. When git clone --recurse-submodules ran, the URL was passed to git fetch (and ultimately to a child process), where the leading dash was parsed as a flag. The attacker-controlled .gitmodules could include --upload-pack= or -c protocol.ext.allow=always ext::... to achieve command execution on the client cloning the repository. Patched by refusing submodule URLs that start with a dash. The lesson: even non-shell call sites (git invoking git) need flag-parsing discipline.
  • ImageMagick "ImageTragick", CVE-2016-3714 (May 2016). Not pure argument injection (the underlying issue was the delegate mechanism running shell-out helpers with user-controlled strings), but the practical exploit shape was "field that reaches a binary that interprets it as flags or commands". Any web app that ran ImageMagick on uploaded images inherited the risk. The mitigation pattern (the policy.xml "remove dangerous coders" config) is essentially "lock down which external binaries the library is willing to invoke at all", which is the library-level version of the allowlist defence.

GitLab and a handful of other large applications have shipped argument-injection CVEs in recent years as well; before quoting a specific CVE number or CVSS score, pull the current NVD entry rather than rely on memory.

Frequently asked questions

Where to go next

This article is the deep dive on the variant that survives shell escaping. The siblings and the wider map:

The recurring lesson is the one I keep writing about across this whole family. Every place untrusted input crosses into something that parses bytes as code is a sink. For argument injection that something is not the shell, it is the binary's own option parser, and the only reliable defence is to make the crossing not happen: validate the value before it travels, pass it as a positional with -- in front, and prefer a library call over a subprocess whenever one exists.

Sources

Authoritative references this article was fact-checked against.

Tagsrceargument-injectioncommand-injectionescapeshellarg

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

XXEinjector Cheat Sheet: Every Flag I Actually Use

A field reference for XXEinjector: target options, request file format with the XXEINJECT marker, OOB and direct modes, PHP filter wrappers, file enumeration, logging, and custom listeners. Grouped by what you are trying to do.