An MD5 hash is 128 bits, exactly 16 bytes raw, or 32 hexadecimal characters as a string. MySQL gives you four reasonable ways to store one, each with a different storage cost and indexing profile: CHAR(32) (the obvious one, 32 bytes), VARCHAR(32) (rarely the right call), BINARY(16) (the best general default, half the size of CHAR), and a split into two BIGINT UNSIGNED columns (also 16 bytes, tied with BINARY(16), but awkward to read). Below is the comparison table plus working DDL and insert/select for each, the indexing implications, and a note on when MD5 is the wrong tool entirely.
Short answer: for a new table, store the raw 16 bytes in BINARY(16) and convert with UNHEX() on insert and HEX() on read. Use CHAR(32) only when you debug at the SQL prompt enough that a readable hex string is worth doubling the storage. If your actual goal is to make a long TEXT/URL column searchable, you don't store the hash by hand at all, you let MySQL compute it in an indexed generated column.
What is an MD5 hash and how is it stored in MySQL?
An MD5 (Message-Digest 5) hash is a fixed-size 128-bit value produced by hashing arbitrary input. It's always 16 bytes in raw binary form, or 32 hexadecimal characters when stringified (0800fc577294c34e0b28ad2839435945 for MD5("hash")). MySQL has no native MD5 column type, so you choose how to represent those 16 bytes: as ASCII hex in a CHAR(32) or VARCHAR(32) column (32 bytes per row, human-readable), as raw bytes in a BINARY(16) column (16 bytes per row, half the storage and index size), or split across two BIGINT UNSIGNED columns (also 16 bytes per row, the same cost as BINARY(16), just harder to query). The right choice depends on whether you need readability for debugging, how indexable the column has to be at scale, and whether you ever need joins against the hash column.
MySQL's built-in MD5() function does the hashing itself: MD5('hash') returns the lowercase 32-character hex string 0800fc577294c34e0b28ad2839435945 (a VARCHAR in the connection's character set), which you then store in one of the columns below, or wrap in UNHEX() to get the raw 16 bytes. You can also compute the hash in your application and hand MySQL the finished value; the storage choices are identical either way.
Everything here applies to MariaDB as well: the column types, UNHEX()/HEX(), the MD5() function, and the indexing all behave the same, so the way you store an MD5 hash in MariaDB is the way you store it in MySQL (the indexed generated-column pattern below needs MariaDB 10.2+).
One MySQL-specific caveat if you generate the hash inside the database with the MD5() function rather than computing it in your application: when the MySQL server runs in FIPS mode, MD5() returns NULL, because MD5 is not a FIPS-approved algorithm. The storage methods below are unaffected (they store whatever 16 bytes you hand them), but a NULL coming back from MD5() will surface as an insert or constraint error. The examples in this article use externally supplied hex strings, which sidestep the issue entirely.
Jump to:
- Storage comparison table
- Method 1: CHAR(32)
- Method 2: VARCHAR(32)
- Method 3: BINARY(16)
- Method 4: Two BIGINT UNSIGNED columns
- Index a large column by its MD5 (generated columns)
- Indexing implications
- When MD5 is the wrong choice
- FAQ
Storage comparison table
| Method | Bytes per row | Raw hash bytes, 1B rows | Readability | Best for |
|---|---|---|---|---|
CHAR(32) | 32 (latin1 / ASCII) | ~32 GB | Direct hex string | General use, debugging-heavy environments |
VARCHAR(32) | 33 (1-byte length prefix) | ~33 GB | Direct hex string | Rarely the right pick |
BINARY(16) | 16 | ~16 GB | Need HEX() to read | Recommended default for high-volume tables |
Two BIGINT UNSIGNED | 16 | ~16 GB | Need CONV() to read | Niche; useful as an all-integer key pair |
The "Raw hash bytes" column is the stored hash value alone, multiplied by a billion rows. It is not the full on-disk index size: an InnoDB secondary index entry also carries the primary-key value and per-record and per-page overhead, so the real index is larger (see the realistic per-entry figures in the index size section below). What the raw column does show cleanly is the relative cost: the 2× difference between BINARY(16) and CHAR(32) carries straight through to the real index. On a billion-row table that is roughly a 16 GB gap in the secondary index, plus less RAM spent keeping the index hot in the buffer pool.
Method 1: CHAR(32)
Most common, most readable. Store the hex string as ASCII.
CREATE TABLE md5_char (
md5 CHAR(32) NOT NULL,
INDEX idx_md5 (md5)
) ENGINE=InnoDB;
INSERT INTO md5_char (md5) VALUES ('0800fc577294c34e0b28ad2839435945');
SELECT * FROM md5_char;
-- 0800fc577294c34e0b28ad2839435945CHAR(32) reserves exactly 32 bytes per row in latin1 or ascii. Don't use a multibyte charset for hash columns, utf8mb4 would reserve 128 bytes per row for the same hex string. Add CHARACTER SET ascii COLLATE ascii_bin to the column if your table default is utf8mb4:
md5 CHAR(32) CHARACTER SET ascii COLLATE ascii_bin NOT NULLThe binary collation makes comparison and indexing slightly faster and case-sensitive (lowercase hex won't match uppercase, which you want for hashes).
Method 2: VARCHAR(32)
Same content as CHAR(32) but with a 1-byte length prefix per row. Storage is 33 bytes per row.
CREATE TABLE md5_varchar (
md5 VARCHAR(32) NOT NULL,
INDEX idx_md5 (md5)
) ENGINE=InnoDB;There's almost never a reason to pick VARCHAR(32) over CHAR(32) for an MD5, the length is always exactly 32. The extra byte adds storage and gains nothing. The only legitimate use is if the column might later hold different fixed-length hashes (e.g., transitioning from MD5 to SHA-256 which is 64 hex chars), and even then a schema change is the cleaner answer.
Method 3: BINARY(16): recommended default
Half the storage of CHAR(32), half the index size, and no charset issues.
CREATE TABLE md5_binary (
md5 BINARY(16) NOT NULL,
INDEX idx_md5 (md5)
) ENGINE=InnoDB;
INSERT INTO md5_binary (md5) VALUES (UNHEX('0800fc577294c34e0b28ad2839435945'));
SELECT HEX(md5) FROM md5_binary;
-- 0800FC577294C34E0B28AD2839435945
Two helpers do all the conversion:
UNHEX('hex_string')→ 16-byte binary value on insert.HEX(binary_col)→ uppercase hex string on read.
For lookups by hash, always wrap the parameter in UNHEX():
SELECT * FROM md5_binary WHERE md5 = UNHEX(?);In application code, store the hex string and let MySQL convert, or work with bytes directly (bytes.fromhex(hex_str) in Python, Buffer.from(hex, 'hex') in Node.js, pack('H*', $hex) in PHP).
For 99% of new tables, BINARY(16) is the right answer.
Method 4: Two BIGINT UNSIGNED columns
Split the 128 bits across two 64-bit unsigned integers. Storage is exactly 16 bytes, same as BINARY(16).
CREATE TABLE md5_bigint (
md5_high BIGINT UNSIGNED NOT NULL,
md5_low BIGINT UNSIGNED NOT NULL,
INDEX idx_md5 (md5_high, md5_low)
) ENGINE=InnoDB;
INSERT INTO md5_bigint (md5_high, md5_low) VALUES (
CONV(LEFT('0800fc577294c34e0b28ad2839435945', 16), 16, 10),
CONV(RIGHT('0800fc577294c34e0b28ad2839435945', 16), 16, 10)
);
SELECT CONCAT(
LPAD(CONV(md5_high, 10, 16), 16, '0'),
LPAD(CONV(md5_low, 10, 16), 16, '0')
) AS md5 FROM md5_bigint;
-- 0800FC577294C34E0B28AD2839435945The two-BIGINT method costs the same bytes as BINARY(16) but is more awkward to query and join. The one historical advantage, that some older MySQL clients had trouble with binary columns, no longer applies. Use BINARY(16) instead unless you have a specific reason (e.g., a partition key that requires integer columns).
Index a large column by its MD5 (generated columns)
The most common real reason to put an MD5 in MySQL is not to store the hash for its own sake, it is to make a long column searchable. You can't put a plain B-tree index on a TEXT column, and a prefix index on a long VARCHAR is useless when the values only diverge near the end (URLs with long query strings, file paths). The fix is a generated column holding MD5() of the long value, indexed for equality lookups. This is where you let the database compute the hash with its built-in MD5() function instead of supplying the hex by hand.
CREATE TABLE urls (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
url TEXT NOT NULL,
url_md5 BINARY(16) AS (UNHEX(MD5(url))) VIRTUAL,
INDEX idx_url_md5 (url_md5)
) ENGINE=InnoDB;
INSERT INTO urls (url) VALUES ('https://example.com/very/long/path?with=a&long=query&string=here');
-- exact-match lookup by the original value
SELECT * FROM urls
WHERE url_md5 = UNHEX(MD5('https://example.com/very/long/path?with=a&long=query&string=here'));You never write to url_md5; MySQL recomputes it from url. MD5(url) produces the 32-char hex string and UNHEX() packs it down to the 16-byte BINARY(16) form, so the index is the compact one from Method 3, not a 32-byte CHAR index.
Use VIRTUAL, not STORED. InnoDB indexes a virtual generated column fine: the hash is materialized in the secondary-index records only, never written into the row, so the table itself stays small. A STORED column writes the hash into the clustered index on every row for no extra benefit when all you need is the lookup. Adding the virtual column is a metadata-only change and adding or dropping its index is an in-place operation; adding a STORED generated column requires a full table rebuild.
You don't even have to name the generated column in the query. The optimizer will use the index whenever a WHERE, ORDER BY, or GROUP BY clause contains an expression that matches the column's definition exactly:
-- also uses idx_url_md5, even though url_md5 is never named
SELECT * FROM urls WHERE UNHEX(MD5(url)) = UNHEX(MD5('https://example.com/...'));
The match has to be exact (identical expression, same result type), so in practice referencing url_md5 directly is the clearer habit. One caveat carries over from earlier: because the value comes from MD5(), the column evaluates to NULL on a server running in FIPS mode (MD5 is not FIPS-approved), which then breaks the index lookups. If you need FIPS compliance, compute the hash in the application and store it in a plain BINARY(16) column instead.
Indexing implications
All four methods support B-tree indexes on the hash column. Lookup performance is essentially identical for point queries (WHERE md5 = ?). Where the methods diverge is index size and therefore buffer-pool efficiency:
| Method | Bytes per index entry (incl. PK pointer) | Pages per million rows (InnoDB 16KB pages) |
|---|---|---|
CHAR(32) | ~40 | ~2,500 |
BINARY(16) | ~24 | ~1,500 |
Two BIGINT | ~24 | ~1,500 |
On a server with limited RAM, the smaller index stays hot in the buffer pool longer, which translates directly to fewer disk reads for index lookups. On modern servers with plentiful RAM the difference is negligible; on RDS micro-instances or cheap VPS it's real.
Hashes are random by design, so the B-tree insert pattern is worst-case for InnoDB, every insert lands in a random page, causing page splits and write amplification. This is the same problem random UUIDs have. If you're inserting millions of rows ordered by created_at, consider whether the hash needs to be the primary key (usually no): make it a secondary index instead.
When MD5 is the wrong choice
MD5 has been cryptographically broken since 2004. Collisions can be generated in seconds on a laptop. That means MD5 is unsafe for:
- Password storage. Use
bcrypt,argon2id, orscryptvia your application's password library, stored in aVARCHAR(255)password column. Never store password MD5 (or SHA-256) directly. - Authentication tokens. Use cryptographically random tokens, not derived hashes.
- File integrity against tampering. Use SHA-256 or BLAKE3.
MD5 is still fine for:
- Content deduplication. "Are these two files probably the same?", MD5 collisions can be deliberately constructed but won't happen by accident.
- ETags and cache keys. Where collision would just cause a cache miss, not a security failure.
- Legacy compatibility. Where an existing system stores MD5 and you can't change it without a migration.
For new tables that store hashes for any security-adjacent purpose, store the full digest: BINARY(32) for SHA-256 (see how to store a SHA-256 hash in MySQL for the full breakdown), or BINARY(32) for BLAKE3's default 32-byte output. Do not truncate a security hash to 128 bits. A 128-bit digest only gives roughly 64-bit collision resistance because of the birthday bound, which is too weak to rely on against an adversary. Truncating to BINARY(16) is reasonable only for non-adversarial work: deduplication and cache keys where a collision is a harmless miss, not a security failure. For those, the same indexing approach applies with a more modern hash.
What to do next
- For the broader reference on every MySQL data type with exact byte sizes, see MySQL Data Types and Sizes: Complete Reference.
- For UUID-style identifiers (which have the same
BINARY(16) vs CHAR(36)trade-off as MD5 hashes), the comparison applies directly,BIN_TO_UUID()andUUID_TO_BIN()are the UUID equivalents ofHEX()andUNHEX(). - For the SQL syntax cheat sheet, MySQL Cheat Sheet covers conversion functions and DDL on one page.
- On the other engine, storing an MD5 hash in PostgreSQL is the same 16-byte idea with
byteaanddecode()/encode()instead ofBINARY(16)andUNHEX()/HEX().
FAQ
Sources
Authoritative references this article was fact-checked against.
- MySQL 8.0 Reference Manual, Encryption and Compression Functions (MD5, UNHEX, HEX)dev.mysql.com
- MySQL 8.0 Reference Manual, Secondary Indexes and Generated Columnsdev.mysql.com
- MySQL 8.0 Reference Manual, Optimizer Use of Generated Column Indexesdev.mysql.com
- MySQL 8.0 Reference Manual, String Data Types (CHAR, VARCHAR, BINARY)dev.mysql.com





