TechEarl

Application-Layer DoS: The Complete 2026 Practitioner Guide

Application-layer (L7) denial of service: Slowloris-class slow reads, HTTP/2 Rapid Reset and CONTINUATION flood, ReDoS, decompression bombs, hash flooding, GraphQL abuse, and the defences that actually hold.

Ishan Karunaratne⏱️ 21 min readUpdated
Share thisCopied
Application-layer denial of service attacks and defences explained end to end

Application-layer DoS is the third spoke on the web application security vulnerabilities map. Unlike the bandwidth-flood DDoS that fills the pipe, L7 DoS makes the application do expensive work on cheap input: a single TCP connection that ties up a worker for an hour, a regex that backtracks for thirty seconds on a 50-byte payload, a GraphQL query that joins five tables for every node in a thousand-node tree. The cost asymmetry is the entire point.

This article is the deep dive that sits next to the SQL injection guide and cross-site scripting deep dive under the same security hub. I walk the major L7 classes (slow-read, HTTP/2 frame abuse, ReDoS, decompression bombs, hash flooding, expensive endpoints, GraphQL), call out the real incidents that defined each class, and finish with the defences that actually move the needle.

In short: what is application-layer DoS?

Application-layer denial of service (L7 DoS) is what happens when a small, valid-looking request forces the application to spend disproportionate CPU, memory, file descriptors, threads, or database time. The attacker does not need a botnet or a fat pipe; one laptop can take down an unprepared origin if the cost asymmetry is high enough. The class covers slow-read attacks like Slowloris that hold connections open with partial requests, protocol abuses like HTTP/2 Rapid Reset and CONTINUATION flood, algorithmic complexity bugs like ReDoS and hash flooding, decompression bombs like the billion-laughs XML entity expansion, and resource amplification through any endpoint that does expensive work without rate limits. The defining feature is that the traffic looks normal in volume; the damage is in what each request costs the server to handle.

Where L7 sits on the DoS map

DoS is usually split into three tiers, and the defences differ at each tier:

  • Volumetric (L3/L4 flood). UDP floods, SYN floods, ICMP floods. The goal is to saturate bandwidth or the kernel's connection table. Mitigated upstream by scrubbing centres, anycast, and SYN cookies. Cloudflare, AWS Shield, and the like live mostly at this layer.
  • Amplification (also L3/L4). DNS, NTP, memcached, and similar reflection attacks where a small spoofed query produces a much larger response from a third-party server. Same upstream mitigations.
  • Application-layer (L7). The subject of this article. Traffic that is valid HTTP, valid TLS, valid HTTP/2, valid GraphQL, often coming from real browsers or distributed proxies, but designed to force expensive work per request. Upstream scrubbing does not see anything wrong with it. The defence has to live inside or in front of the application.

The line is blurry in practice. The Internet Archive outage of October 2024 mixed L3/L4 volumetric pressure with L7 components against the search and Wayback Machine endpoints. The HTTP/2 Rapid Reset campaign of August to October 2023 was technically L7 (valid HTTP/2 frames) but delivered at volumetric scale through botnets. The threat models overlap; the defences do not always.

Slowloris and the slow-read family

Slowloris was published by Robert Hansen in June 2009 and remains the canonical L7 DoS. The mechanism: open a TCP connection to the target, send the first line of an HTTP request (GET / HTTP/1.1\r\n), send a single header (X-a: a\r\n), then send another header every few seconds, indefinitely. The server keeps the connection alive waiting for the request to complete. Repeat across hundreds or thousands of sockets from one host and you exhaust the connection pool. Legitimate users get refused.

The variants over the years cover every direction of slowness:

  • Slow headers (the original Slowloris). Drip-feed headers; never finish.
  • Slow POST (also called R-U-Dead-Yet, RUDY). Send Content-Length: 100000, then write the body one byte every ten seconds. The server holds the worker waiting for the full body.
  • Slow read. Complete the request normally, then read the response one byte at a time with a tiny TCP receive window. The server cannot free the worker until the response is fully flushed.

The defining trait of the family is that resource exhaustion comes from many idle-but-not-quite-idle connections, not from request volume. A single host can sustain tens of thousands of them against a worker-per-connection server (classical Apache prefork, older PHP-FPM pools, anything without a non-blocking I/O layer in front).

Defence is event-driven I/O (nginx, Caddy, Envoy, Go's net/http) plus hard timeouts on the receive side: a header-read deadline (a few seconds), a body-read deadline scaled to expected upload size, and a per-connection idle timeout. The non-blocking server costs are dominated by file descriptors rather than worker threads, which raises the bar from "a few thousand idle sockets kill the box" to "you need millions of sockets to matter, and the kernel will tell you long before that".

HTTP/2 Rapid Reset (CVE-2023-44487)

HTTP/2 Rapid Reset was disclosed jointly by Cloudflare, Google, and AWS on October 10 2023, after an attack campaign starting in August 2023 produced the largest L7 DDoS recorded to that point. Google logged a peak of 398 million requests per second against its infrastructure; Cloudflare logged 201 million; AWS reported similar magnitudes. The CVE is CVE-2023-44487 and it sits in the HTTP/2 specification itself rather than any one implementation.

The mechanism: HTTP/2 multiplexes many streams over one connection, and the client can cancel a stream at any time by sending a RST_STREAM frame. The server has to do real work to set up each stream (parse HEADERS, allocate state, dispatch to a handler), but cancellation is free for the client. An attacker opens a connection, requests MAX_CONCURRENT_STREAMS streams, immediately sends RST_STREAM for each one, and repeats. The server burns CPU initialising and tearing down streams; the client sends a few bytes per cycle. Over many connections from a botnet, the asymmetry is enormous.

The fix has two parts. Implementation-side, every HTTP/2 server and proxy patched to track the ratio of opened-then-cancelled streams per connection and either rate-limit cancellations or drop the connection when the ratio gets pathological. Spec-side, the HTTP/2 working group tightened guidance around stream cancellation accounting. If you run anything HTTP/2 (nginx, Apache, Envoy, HAProxy, Caddy, Node, Go), make sure you are on a version released after October 2023; the patch landed within days of the disclosure across every major implementation.

HTTP/2 CONTINUATION flood

In April 2024, Bartek Nowotarski published the HTTP/2 CONTINUATION flood, affecting roughly a dozen HTTP/2 implementations. The mechanism is structurally similar to Rapid Reset but uses a different frame.

HTTP/2 splits a header block across one HEADERS frame plus zero or more CONTINUATION frames. The spec does not cap the number of CONTINUATION frames or the total header-block size, leaving that to the implementation. Several implementations failed to enforce a cap, or enforced it only after fully decompressing HPACK. An attacker sends a HEADERS frame with END_HEADERS unset, followed by an unbounded stream of CONTINUATION frames, each carrying compressed header data. The server accumulates the buffer in memory, decompresses it on the fly, and runs out of either RAM or CPU before it ever dispatches the request.

The fixes were per-implementation: Node.js, Go, Apache httpd, Envoy, and several others all shipped patches in April and May 2024. The defence pattern is the same as Rapid Reset: track frame counts and header-block sizes per connection, reject pathological clients, never trust a header to be small just because the spec does not say it cannot be.

Both Rapid Reset and CONTINUATION flood are reminders that protocol specs encode a trust model, and an HTTP/2 server has to defend the budget of every per-connection resource the spec lets a client influence: streams, header bytes, frame counts, window updates, settings frames.

ReDoS: catastrophic regex backtracking

Regular expressions look declarative, but most engines (PCRE, Perl, .NET, Java, JavaScript, Python re, Ruby) execute with a backtracking NFA whose worst-case runtime can be exponential in the input length. A small set of regex shapes hit that worst case on crafted input. The shorthand is ReDoS.

The classic pathological pattern is nested quantifiers: ^(a+)+$. Against the input aaaaaaaaaaaaaaaaaaaaX, the engine tries every partition of the leading as across the inner and outer groups before failing. Adding one more a doubles the runtime. Twenty as is microseconds; forty is seconds; sixty is minutes. The CPU is real; the request is fifty bytes.

The incidents that shaped how the industry thinks about ReDoS:

  • Stack Overflow, July 20 2016. A unicode-aware trim regex in the home-page renderer hit pathological backtracking on a specific user-supplied post body (technically polynomial rather than exponential, as the postmortem notes, but enough to take the site down). The site went down for thirty-four minutes. The postmortem is the canonical worked example.
  • Cloudflare, July 2 2019. A regex in a WAF rule update ((?:(?:\"|'|\]|\}|\\|\d|(?:nan|infinity|true|false|null|undefined|symbol|math)|\|-|+)+[)];?((?:\s|-|~|!|||||+).(?:.=.*)))`) hit catastrophic backtracking on a fraction of incoming requests, spiking CPU to 100 percent across Cloudflare's edge globally and taking the network offline for about twenty-seven minutes. The public postmortem is required reading.
  • Express.js (path-to-regexp CVE-2024-45296 and related). Several regex patterns in commonly-used Express middleware (path-to-regexp, body-parser, cookie) have been flagged for ReDoS over the years. The fixes are usually a rewrite to a non-backtracking shape; the takeaway is that even widely-deployed libraries are not audited for regex complexity by default.

The defences are layered:

  1. Use a non-backtracking engine where possible. Google RE2 executes regexes in linear time with respect to the input length and does not support backreferences (which is what forces backtracking). The Rust regex crate follows the same model. Both are drop-in for the vast majority of pattern needs. Node has re2 bindings; Go's regexp package is RE2-based.
  2. Static analysis on regex patterns. Tools like Recheck statically detect catastrophic-backtracking shapes in JavaScript, Python, and other ecosystems. Run them in CI.
  3. Per-request CPU budgets and timeouts. Cap the time a single request can spend in any regex evaluation. The Cloudflare 2019 outage now ships with a CPU watchdog around every WAF rule for exactly this reason.

If you do nothing else, audit every regex that runs against user input and rewrite the nested-quantifier patterns. The shapes to look for are well documented: (a+)+, (a|a)+, (a|aa)+, and any variation where two parts of the pattern can match the same substring.

Decompression bombs: zip bombs, billion laughs, gzip bombs

A decompression bomb is a tiny compressed payload that expands to a huge uncompressed size. The attacker sends a few kilobytes; the server allocates gigabytes; the server crashes.

The variants:

  • Zip bomb. The classical 42.zip: a 42 KB ZIP file that expands to 4.5 petabytes through nested archives. Any service that auto-extracts uploaded archives without checking the manifest is vulnerable.
  • Gzip bomb. A 10 KB gzip-encoded HTTP response body that decompresses to 10 GB. Hits clients and proxies that decompress responses, and servers that decompress request bodies under Content-Encoding: gzip.
  • Billion laughs (XML entity expansion). The billion laughs attack is the same idea encoded in XML entities: nine levels of &lol; definitions, each referring to ten of the previous level, producing a billion lol strings on parse from a ~1 KB payload. Affects any XML parser with entity expansion enabled (the default in many older libraries).
  • PDF, image, and font bombs. Hostile inputs to media parsers (libpng, libtiff, image-resizer pipelines, font shapers, PDF rendering) can also amplify CPU or memory on parse. The defence is the same: bound the work per input.

Defences are uniform across the variants: cap the decompressed size, cap the decompression CPU, stream rather than buffer where possible, and disable XML external entity expansion (DTDs, <!ENTITY> declarations) unless the application genuinely needs them. The OWASP XXE Prevention Cheat Sheet covers the XML side; for gzip-encoded request bodies, most frameworks expose a max-decompressed-size setting (nginx client_max_body_size combined with body-decoding limits in the upstream).

Hash flooding (HashDoS)

Most hash maps are O(1) on average and O(n) worst case when every key collides into the same bucket. If an attacker can predict the hash function (because it is fixed and unkeyed) and craft a set of N keys that all collide, the map becomes a linked list and every insert/lookup becomes O(N). For an HTTP server that hashes form-field names, header names, or query-parameter names into a map, a request with thousands of colliding keys turns a microsecond of parse time into seconds of CPU.

The class was publicly described in 2003 and weaponised at scale around 2011 to 2012. Affected runtimes and the year they patched:

  • PHP and Ruby. Patched in late 2011 to early 2012 by limiting the maximum number of POST parameters (max_input_vars) and tightening string-keyed hash behaviour.
  • Python. Switched the default dict hash to randomised SipHash-keyed-per-process in Python 3.3 (2012).
  • Java. OpenJDK and HotSpot moved HashMap to use a balanced tree for buckets above a threshold size (Java 8, 2014), bounding worst-case behaviour at O(log n).
  • Node.js and V8. Use a randomised hash seed per process.
  • Go. Uses a randomised seed per map instance.

Modern runtimes have largely closed the obvious holes through randomised, keyed hashing (SipHash) and per-map seeds. The cases that remain are application-level: any code that uses an unkeyed deterministic hash function for an attacker-influenced key set (custom JSON parsers, custom routers, custom cache layers). The audit pattern is "where does the code put attacker-supplied strings into a map, and what does the hash function look like?".

Resource amplification through expensive endpoints

The least-glamorous and most-common L7 DoS is simply hitting an endpoint that does too much work per request, without per-IP or per-user rate limits. No exploit, no CVE, just engineering laziness.

The pattern repeats across products:

  • Unrate-limited search. A search endpoint that runs an Elasticsearch query against a billion-document index, with no cap on from/size, no cap on aggregation depth, no per-IP throttle. One curl loop saturates the cluster.
  • Login endpoints with bcrypt at cost factor 12. Bcrypt at cost 12 takes around 250 ms per hash on a typical x86 core. That is the entire point on the credential-stuffing side. It is also a compute amplifier on the attacker side: an attacker posting 100 logins per second to your login endpoint forces 25 seconds of CPU per second from the server. Rate-limit login attempts per username and per IP, and put a captcha or PoW gate before the bcrypt call after a few failures.
  • PDF generation, image resize, video transcode, OG image rendering. Anything that spawns a subprocess or allocates a large buffer per request is an amplifier. Queue these jobs, do not run them inline in the request path, and cap concurrency and rate per source.
  • Report generation endpoints. Anything that runs a long-running SQL query, especially one with a join across multiple large tables, against attacker-controllable filters.

The fix is universal: every public endpoint needs a rate limit (per IP, per token, per user, whichever applies), and any endpoint that does expensive synchronous work needs an additional concurrency cap and ideally a queue. None of this is novel. It is, however, the single highest-yield investment in L7 DoS resilience for most products.

GraphQL-specific abuse

GraphQL has L7 DoS shapes that REST does not, because the client gets to choose the shape of the query and the server commits to fulfilling it.

The three main classes:

  • Deeply-nested queries. A schema with cyclic relationships (a User has posts, a Post has an author, an author is a User) lets a client write a query that descends fifty levels: user { posts { author { posts { author { posts { ... } } } } } }. Each level is a database round trip. The query is short; the work is enormous.
  • Alias multiplication. GraphQL lets a client request the same field multiple times under different aliases: { a: expensiveField b: expensiveField c: expensiveField ... } repeated a thousand times in one query. The server runs the resolver a thousand times per request.
  • Batching abuse. Most GraphQL servers accept arrays of queries in one POST. A thousand-query batch where each query is itself an alias-multiplied expensive request stacks the amplifications.

Defences:

  1. Query depth limits. Reject queries deeper than (say) 10 levels of nested selection at parse time. Libraries: graphql-depth-limit for Node, similar plugins for every major language binding.
  2. Query cost analysis. Assign a cost to each field (resolver complexity, expected fan-out, database hit cost) and reject queries above a per-request budget. Apollo, Hasura, and most modern GraphQL servers ship cost-analysis middleware.
  3. Persisted queries (also called allow-listed operations). Ship a fixed catalogue of approved queries from the client build; the server only accepts query hashes from that catalogue at runtime. Eliminates arbitrary queries entirely, at the cost of slightly more build coupling.
  4. Per-operation rate limits. Some operations are expensive; cap them per token regardless of the global rate limit.

If you run a public GraphQL endpoint without query-depth limits and without cost analysis, assume you are one curiosity-driven afternoon away from an L7 incident.

Real-world incidents

A short tour of L7 DoS in production. Each linked source covers the per-incident details I am not repeating here.

  • Cloudflare, July 2 2019 (ReDoS). A WAF rule update introduced a catastrophic-backtracking regex; CPU saturated globally for about twenty-seven minutes. The fix shipped a CPU watchdog around every rule evaluation. The canonical industrial-scale ReDoS incident.
  • Stack Overflow, July 20 2016 (polynomial-backtracking ReDoS). A unicode-aware trim regex on the home-page renderer hit a pathological input from a user post; the site was down for thirty-four minutes. Fix was a pattern rewrite.
  • HTTP/2 Rapid Reset, August to October 2023 (CVE-2023-44487). Botnet-driven L7 attack against Cloudflare, Google, AWS, and many origin servers behind them. Joint disclosure on October 10 2023; every major HTTP/2 implementation patched within days. Peak traffic 398 million RPS at Google, 201 million at Cloudflare.
  • HTTP/2 CONTINUATION flood, April 2024 (Bartek Nowotarski). Memory exhaustion via unbounded CONTINUATION frames. Node.js, Go, Apache httpd, Envoy, and others shipped patches in April and May 2024.
  • Internet Archive, October 2024. A multi-week wave of attacks combined volumetric L3/L4 pressure with L7 components against search, Wayback Machine, and Open Library endpoints, on top of a parallel credential leak. The L7 component is well-described in the IA's public communications; the precise vector breakdown is not public in full.
  • Memcached amplification, late February 2018. GitHub took a 1.35 Tbps DDoS that peaked at 1.7 Tbps shortly after at another target. Technically L4 amplification rather than L7, but the incident is what pushed the industry to clean up open memcached endpoints.

For per-incident CVSS and version details, pull the entry from nvd.nist.gov or the vendor postmortem at the time of writing; version-specific numbers age quickly and I would rather link out than risk a stale figure.

Defence stack at a glance

A defence-in-depth recipe for L7 DoS, ordered roughly from edge to application:

  1. Upstream scrubbing / CDN. Cloudflare, Fastly, Akamai, AWS Shield Advanced, GCP Cloud Armor. Catches obvious volumetric pressure and a growing share of L7 patterns through bot scoring. Necessary, not sufficient.
  2. Connection-level limits at the reverse proxy. Per-IP connection caps, header-read deadlines, body-read deadlines, idle timeouts, HTTP/2 frame budgets (max streams per connection, max header-block size, max CONTINUATION frames, RST cancellation rate). In nginx the directives are limit_conn, client_header_timeout, client_body_timeout, keepalive_timeout, http2_max_concurrent_streams. In Caddy and Envoy the equivalents are first-class config.
  3. Rate limits per IP, per token, per user. The single highest-yield investment. Apply them in front of every expensive endpoint and globally as a safety net. Library-level: limiter for Node, tollbooth for Go, rack-attack for Rails. Edge-level: every CDN provider ships rate-limit rules.
  4. Request size caps. Body size, header size, query-string length, number of multipart parts, max upload size. Set them once globally and tighten per-endpoint where it matters.
  5. Decompression caps. Maximum decompressed size for any Content-Encoding: gzip body, maximum entity expansion for XML, maximum archive size for any upload that gets auto-extracted.
  6. Regex complexity audit and runtime budget. Replace backtracking engines with RE2 or the Rust regex crate wherever feasible. Run Recheck in CI. Cap CPU time per regex evaluation in any code path that matches against user input.
  7. GraphQL: query-depth limits, cost analysis, persisted queries. Cover all three layers; any one alone is bypassable.
  8. Per-endpoint concurrency caps. Anything that runs an expensive synchronous operation (PDF render, image resize, report SQL) gets a concurrency cap and a queue, not just a rate limit.
  9. Observability. Per-endpoint latency histograms, p99 CPU per request, regex CPU watchdog metrics, GraphQL query cost histograms. You cannot defend what you cannot measure.

The pattern across all of these is the same: assume the request is hostile, bound every per-request resource (time, CPU, memory, file descriptors, downstream calls), and fail fast on the requests that exceed the budget. The interesting question for any application is not whether to do this; it is which resource budget to start with.

A note on the line between DoS and abuse

Bug-bounty programmes routinely exclude "denial of service" from scope, which is technically reasonable (you do not want researchers stress-testing prod) and practically frustrating (a working ReDoS, a working query of death, a working GraphQL cost-explosion is real). The pragmatic interpretation:

  • In scope (most programmes): demonstrable algorithmic complexity bugs (ReDoS, hash flooding, decompression bombs) reported against a non-production instance or with a non-disruptive proof such as a CPU-time measurement; protocol-level bugs with a clean reproducer such as a packet capture or a small client; logic bombs (expensive endpoints with no rate limit) reported against a non-production tenant.
  • Out of scope (most programmes): any test against production that actually impacts other users, volumetric flooding of any kind, anything that requires running a botnet.

If you find an L7 DoS in someone else's product, the right move is almost always to report it without demonstrating it at scale, including enough technical detail (the regex pattern, the GraphQL query shape, the frame sequence) that the vendor can reproduce in their own environment. Triagers reward clean reports; they punish "I took your service down for forty minutes to prove the bug exists".

Where to go next

The L7 DoS cluster fans out from this hub the same way the XSS cluster does. The tools listicle lives at /best-application-layer-dos-tools-2026 covering slowhttptest, h2load with cancellation patterns, GoldenEye, regex fuzzers, GraphQL cost analysers, and the observability tooling that catches L7 incidents before they tip a service over.

The closest sibling deep dive is the billion laughs attack, which is one specific decompression-bomb variant inside the broader XML parser surface. For the wider map, back up to the web application security vulnerabilities taxonomy. For the injection cousins that share the "untrusted input as code in some interpreter" mental model, the SQL injection guide and the cross-site scripting deep dive cover the database and browser ends of that family.

Sources

Authoritative references this article was fact-checked against.

TagsDoSDDoSApplication SecurityHTTP/2ReDoSGraphQLRate LimitingOWASP

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts