TechEarl

TEXT vs BLOB in MySQL: Storing Large Text and Binary Data

TEXT vs BLOB in MySQL: TEXT is for character data and has a charset, BLOB is for raw bytes and does not. The size families, the row-limit and indexing gotchas, and why files belong in object storage, not a BLOB.

Ishan Karunaratne⏱️ 13 min readUpdated
Share thisCopied
TEXT vs BLOB in MySQL for storing large text and binary data: the TINY/MEDIUM/LONG size families, character set versus raw bytes, the InnoDB row-size limit, prefix indexing, and why files belong in object storage.

The difference between TEXT and BLOB in MySQL is one thing: TEXT is for character data and carries a character set and collation, while BLOB is for raw bytes and has no character set, so comparisons and sorting are byte-wise. Everything else about them is identical: same four size families, same byte limits, same storage behavior, same indexing rules. Pick TEXT (and a TEXT subtype) for anything that is human-readable text in a known encoding, pick BLOB for opaque binary you never sort or collate by language rules, and pick the smallest family that fits your real maximum.

Short answer: for a long article body, comment, JSON payload, or any text in a charset, use the TEXT family. For genuinely binary data (encrypted blobs, serialized structures, small thumbnails) use the BLOB family. For files (images, PDFs, user uploads), use neither: store the file in object storage (S3, Cloudflare R2) and keep only the key in a VARCHAR. Below is the size comparison, a TEXT vs VARCHAR note, a worked schema, and the row-size and indexing gotchas that trip people up.

TEXT vs BLOB: the only real difference

TEXT values are nonbinary strings. They have a character set (whatever the column or table default is, often utf8mb4) and a collation, and MySQL sorts and compares them using that collation's language rules. 'A' can equal 'a' under a case-insensitive collation; 'ä' sorts where the collation says it sorts.

BLOB values are binary strings. They have the binary character set and binary collation, which is to say none of the language machinery applies. Comparison and sorting use the raw numeric byte values, so it is always case-sensitive, encoding-agnostic, and locale-free. If you store UTF-8 text in a BLOB it works, but you lose collation-aware search and you have to handle the encoding yourself.

That is the whole distinction. Structurally the two families are the same: TINYTEXT matches TINYBLOB, TEXT matches BLOB, MEDIUMTEXT matches MEDIUMBLOB, LONGTEXT matches LONGBLOB, byte for byte. So the choice is just: is this data text in a known encoding (use TEXT), or opaque bytes (use BLOB)?

This applies identically to MariaDB, which inherits the same TEXT/BLOB families, byte limits, charset behavior, and InnoDB off-page storage from its MySQL lineage. Nothing in this article changes between the two engines.

The four size families

Both families come in four sizes, distinguished only by the maximum value length and the size of the internal length prefix MySQL stores with each value. The limits below are in bytes, which matters for TEXT: in a multibyte charset like utf8mb4 (up to 4 bytes per character), the character capacity is a quarter of the byte limit.

TypeMax lengthIn human termsLength prefix
TINYTEXT / TINYBLOB255 bytesa short string1 byte
TEXT / BLOB65,535 bytes~64 KB2 bytes
MEDIUMTEXT / MEDIUMBLOB16,777,215 bytes~16 MB3 bytes
LONGTEXT / LONGBLOB4,294,967,295 bytes~4 GB4 bytes

So a MEDIUMTEXT column holding utf8mb4 content tops out near 16 MB of bytes but only around 4 million characters if every character is a 4-byte emoji. For ordinary mostly-ASCII prose the byte and character counts are close, but never assume the limit is in characters.

Pick the smallest family that fits your real maximum. It is tempting to declare everything LONGTEXT and stop thinking, but the size you choose is a documented constraint that catches bad data early, and it affects how MySQL handles the column in temporary tables and sorts. A blog body that is realistically under a megabyte is MEDIUMTEXT, not LONGTEXT. A 200-character bio is VARCHAR(200), not TEXT at all (see below). Reach for LONGTEXT/LONGBLOB only when you genuinely need values past 16 MB.

TEXT vs VARCHAR: when to switch

Before reaching for TEXT, ask whether VARCHAR fits. The dividing line is bounded versus unbounded, and whether you need to index the whole value.

VARCHAR(n) is stored in-row (subject to the row-size limit below) and can take a full B-tree index, a DEFAULT value, and ordinary equality and range lookups without ceremony. Use it for values with a known, reasonable upper bound that you filter or sort on: names, slugs, titles, email addresses, status strings. A VARCHAR can hold up to 65,535 bytes total, so it covers a lot of ground.

TEXT is for genuinely unbounded content where you do not know or do not want to cap the length, and where you will not put a whole-column index on it: article bodies, comments, log lines, serialized JSON, free-form notes. (If you actually query into that JSON, the native JSON column type beats TEXT and is the better call.) The trade-offs that come with TEXT (no DEFAULT, prefix-only indexing, off-page storage) are the price of that flexibility.

A useful rule: if you would index the entire value for lookups, it wants to be VARCHAR. If it is large free text you will only search with FULLTEXT or filter by other columns, it wants to be TEXT. For the exact byte sizes of VARCHAR and every other column type, see MySQL data types and sizes.

The gotchas: row size, DEFAULT, and indexing

Three behaviors of TEXT/BLOB columns surprise people, and all three come straight from the docs.

They are stored off-page in InnoDB. For the common DYNAMIC and COMPRESSED row formats, InnoDB stores a large TEXT/BLOB value on separate overflow pages and keeps only a 20-byte pointer (plus a small inline prefix in older formats) in the row itself. That is why you can store a 4 GB LONGBLOB in a table whose rows otherwise look tiny. But the catch is the 65,535-byte row-size limit: that limit counts the pointers, not the off-page payload, so a table with many TEXT/BLOB columns can still blow the row limit through the accumulated pointers and prefixes alone. The fix when you hit it is usually fewer big columns per table, or splitting into a child table.

They cannot have a DEFAULT value (other than NULL, where the column is nullable). MySQL rejects body TEXT NOT NULL DEFAULT ''. If you need an empty-string default, either make the column nullable, or set the value explicitly in your INSERT, or use a VARCHAR if the data actually fits one.

You can only prefix-index them. A plain INDEX(body) on a TEXT column is an error; you must give a prefix length, like INDEX(body(255)), which indexes only the first 255 bytes. That helps prefix lookups and uniqueness on the leading bytes, but it cannot give you a full unique constraint or an index that distinguishes values differing only near the end (URLs, long paths, document bodies). For real text search inside a TEXT column, add a FULLTEXT index instead, which is purpose-built for it. And if you need to look up a long value by exact match cheaply, hash it: store a SHA-256 of the column in an indexed generated column and match on the hash.

They can push an internal temporary table to disk, but less than they used to. A query that builds an internal temporary table (some GROUP BY, DISTINCT, UNION, or sort plans) and carries a TEXT/BLOB column through it used to spill straight to disk: the old MEMORY engine could not hold those types at all. Since MySQL 8.0 the default TempTable engine handles BLOB/TEXT in memory up to its configured limit before spilling, so the blanket "any BLOB in the result forces a disk temp table" rule no longer holds. The practical advice is unchanged though: do not drag a large TEXT/BLOB column through a query that does not need it (select it only when you actually return it), because once a temporary table does overflow to disk, the big column is what fills it.

Don't store files in a BLOB

The most common BLOB question is "should I store uploaded images/PDFs in a LONGBLOB?" The practical answer is almost always no. Keep the file in object storage (S3, Cloudflare R2, GCS) and store only the object key or URL in MySQL, in a VARCHAR.

The reasons are operational, not theoretical. Large binaries in the database bloat every mysqldump and physical backup, so your backups grow and slow down in proportion to your file volume. They inflate replication traffic, since every byte of every blob crosses the wire to every replica. They evict useful index and row pages from the InnoDB buffer pool, because the bytes you read to serve one download are competing for the same RAM as the queries that actually need to be fast. And serving a file means routing it through your application and database tier instead of letting a CDN or the object store hand it out directly.

There are narrow cases where an in-database blob is defensible: the data is small, it must stay transactionally consistent with the row that references it (so an external-store write that can fail independently is a liability), and you genuinely want it to appear and disappear atomically with that row. A small encrypted token, a tiny generated thumbnail, a serialized state blob a few kilobytes long. Even then, lean toward the smallest BLOB family that fits and measure the buffer-pool impact. But for user uploads at any real scale, the file goes to object storage and the database holds a VARCHAR key.

A worked schema

Two tables that show the split: one storing large text in the database, one referencing a file by key instead of embedding it.

sql
-- Large text lives in the DB. MEDIUMTEXT covers bodies up to ~16 MB.
CREATE TABLE articles (
  id           BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  title        VARCHAR(200) NOT NULL,
  slug         VARCHAR(200) NOT NULL,
  body         MEDIUMTEXT NOT NULL,
  created_at   TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  UNIQUE KEY uq_articles_slug (slug),
  FULLTEXT KEY ft_articles_body (body)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

-- Full-text search inside the body (the right tool for a TEXT column).
SELECT id, title
FROM articles
WHERE MATCH(body) AGAINST ('innodb row format' IN NATURAL LANGUAGE MODE);

-- Files do NOT live in the DB. Store the object-storage key, not the bytes.
CREATE TABLE documents (
  id            BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  article_id    BIGINT UNSIGNED NOT NULL,
  filename      VARCHAR(255) NOT NULL,
  content_type  VARCHAR(100) NOT NULL,
  byte_size     BIGINT UNSIGNED NOT NULL,
  storage_key   VARCHAR(512) NOT NULL,  -- e.g. "uploads/2018/12/report.pdf" in R2
  created_at    TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  UNIQUE KEY uq_documents_key (storage_key),
  KEY idx_documents_article (article_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

articles.body is MEDIUMTEXT because a long article can run past the 64 KB TEXT limit but is nowhere near 16 MB, and it carries a FULLTEXT index so search happens in the database rather than a LIKE '%...%' scan. Notice body has no DEFAULT (it cannot) and no plain B-tree index (only FULLTEXT). documents never holds a byte of file content: storage_key is the R2/S3 path, indexed and unique, and your application fetches or signs a URL to the object store when it needs the file.

For the conversion functions, FULLTEXT syntax, and DDL patterns on one page, the MySQL cheat sheet is the quick reference.

FAQ

See also

Sources

Authoritative references this article was fact-checked against.

TagsMySQLTEXTBLOBDatabase StorageSchema DesignInnoDBData Types

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

How to Store JSON in MySQL: The JSON Type vs TEXT

MySQL has a native JSON column type that validates on insert, stores a parsed binary format, and gives you path operators like ->>. When to reach for it over TEXT, how to index it with a generated column, and the MariaDB difference.