An MD5 hash is 128 bits — exactly 16 bytes raw, or 32 hexadecimal characters as a string. MySQL gives you four reasonable ways to store one, each with a different storage cost and indexing profile: CHAR(32) (the obvious one, 32 bytes), VARCHAR(32) (rarely the right call), BINARY(16) (the best general default, half the size of CHAR), and a split into two BIGINT UNSIGNED columns (also 16 bytes, tied with BINARY(16), but awkward to read). Below is the comparison table plus working DDL and insert/select for each, the indexing implications, and a note on when MD5 is the wrong tool entirely.
What is an MD5 hash and how is it stored in MySQL?
An MD5 (Message-Digest 5) hash is a fixed-size 128-bit value produced by hashing arbitrary input. It's always 16 bytes in raw binary form, or 32 hexadecimal characters when stringified (0800fc577294c34e0b28ad2839435945 for MD5("hash")). MySQL has no native MD5 column type, so you choose how to represent those 16 bytes: as ASCII hex in a CHAR(32) or VARCHAR(32) column (32 bytes per row, human-readable), as raw bytes in a BINARY(16) column (16 bytes per row, half the storage and index size), or split across two BIGINT UNSIGNED columns (also 16 bytes per row, the same cost as BINARY(16), just harder to query). The right choice depends on whether you need readability for debugging, how indexable the column has to be at scale, and whether you ever need joins against the hash column.
One MySQL-specific caveat if you generate the hash inside the database with the MD5() function rather than computing it in your application: when the MySQL server runs in FIPS mode, MD5() returns NULL, because MD5 is not a FIPS-approved algorithm. The storage methods below are unaffected (they store whatever 16 bytes you hand them), but a NULL coming back from MD5() will surface as an insert or constraint error. The examples in this article use externally supplied hex strings, which sidestep the issue entirely.
Jump to:
- Storage comparison table
- Method 1: CHAR(32)
- Method 2: VARCHAR(32)
- Method 3: BINARY(16)
- Method 4: Two BIGINT UNSIGNED columns
- Indexing implications
- When MD5 is the wrong choice
- FAQ
Storage comparison table
| Method | Bytes per row | Raw hash bytes, 1B rows | Readability | Best for |
|---|---|---|---|---|
CHAR(32) | 32 (latin1 / ASCII) | ~32 GB | Direct hex string | General use, debugging-heavy environments |
VARCHAR(32) | 33 (1-byte length prefix) | ~33 GB | Direct hex string | Rarely the right pick |
BINARY(16) | 16 | ~16 GB | Need HEX() to read | Recommended default for high-volume tables |
Two BIGINT UNSIGNED | 16 | ~16 GB | Need CONV() to read | Niche; useful as an all-integer key pair |
The "Raw hash bytes" column is the stored hash value alone, multiplied by a billion rows. It is not the full on-disk index size: an InnoDB secondary index entry also carries the primary-key value and per-record and per-page overhead, so the real index is larger (see the realistic per-entry figures in the index size section below). What the raw column does show cleanly is the relative cost: the 2× difference between BINARY(16) and CHAR(32) carries straight through to the real index. On a billion-row table that is roughly a 16 GB gap in the secondary index, plus less RAM spent keeping the index hot in the buffer pool.
Method 1: CHAR(32)
Most common, most readable. Store the hex string as ASCII.
CREATE TABLE md5_char (
md5 CHAR(32) NOT NULL,
INDEX idx_md5 (md5)
) ENGINE=InnoDB;
INSERT INTO md5_char (md5) VALUES ('0800fc577294c34e0b28ad2839435945');
SELECT * FROM md5_char;
-- 0800fc577294c34e0b28ad2839435945CHAR(32) reserves exactly 32 bytes per row in latin1 or ascii. Don't use a multibyte charset for hash columns — utf8mb4 would reserve 128 bytes per row for the same hex string. Add CHARACTER SET ascii COLLATE ascii_bin to the column if your table default is utf8mb4:
md5 CHAR(32) CHARACTER SET ascii COLLATE ascii_bin NOT NULLThe binary collation makes comparison and indexing slightly faster and case-sensitive (lowercase hex won't match uppercase, which you want for hashes).
Method 2: VARCHAR(32)
Same content as CHAR(32) but with a 1-byte length prefix per row. Storage is 33 bytes per row.
CREATE TABLE md5_varchar (
md5 VARCHAR(32) NOT NULL,
INDEX idx_md5 (md5)
) ENGINE=InnoDB;There's almost never a reason to pick VARCHAR(32) over CHAR(32) for an MD5 — the length is always exactly 32. The extra byte adds storage and gains nothing. The only legitimate use is if the column might later hold different fixed-length hashes (e.g., transitioning from MD5 to SHA-256 which is 64 hex chars), and even then a schema change is the cleaner answer.
Method 3: BINARY(16) — recommended default
Half the storage of CHAR(32), half the index size, and no charset issues.
CREATE TABLE md5_binary (
md5 BINARY(16) NOT NULL,
INDEX idx_md5 (md5)
) ENGINE=InnoDB;
INSERT INTO md5_binary (md5) VALUES (UNHEX('0800fc577294c34e0b28ad2839435945'));
SELECT HEX(md5) FROM md5_binary;
-- 0800FC577294C34E0B28AD2839435945Two helpers do all the conversion:
UNHEX('hex_string')→ 16-byte binary value on insert.HEX(binary_col)→ uppercase hex string on read.
For lookups by hash, always wrap the parameter in UNHEX():
SELECT * FROM md5_binary WHERE md5 = UNHEX(?);In application code, store the hex string and let MySQL convert, or work with bytes directly (bytes.fromhex(hex_str) in Python, Buffer.from(hex, 'hex') in Node.js, pack('H*', $hex) in PHP).
For 99% of new tables, BINARY(16) is the right answer.
Method 4: Two BIGINT UNSIGNED columns
Split the 128 bits across two 64-bit unsigned integers. Storage is exactly 16 bytes, same as BINARY(16).
CREATE TABLE md5_bigint (
md5_high BIGINT UNSIGNED NOT NULL,
md5_low BIGINT UNSIGNED NOT NULL,
INDEX idx_md5 (md5_high, md5_low)
) ENGINE=InnoDB;
INSERT INTO md5_bigint (md5_high, md5_low) VALUES (
CONV(LEFT('0800fc577294c34e0b28ad2839435945', 16), 16, 10),
CONV(RIGHT('0800fc577294c34e0b28ad2839435945', 16), 16, 10)
);
SELECT CONCAT(
LPAD(CONV(md5_high, 10, 16), 16, '0'),
LPAD(CONV(md5_low, 10, 16), 16, '0')
) AS md5 FROM md5_bigint;
-- 0800FC577294C34E0B28AD2839435945The two-BIGINT method costs the same bytes as BINARY(16) but is more awkward to query and join. The one historical advantage — that some older MySQL clients had trouble with binary columns — no longer applies. Use BINARY(16) instead unless you have a specific reason (e.g., a partition key that requires integer columns).
Indexing implications
All four methods support B-tree indexes on the hash column. Lookup performance is essentially identical for point queries (WHERE md5 = ?). Where the methods diverge is index size and therefore buffer-pool efficiency:
| Method | Bytes per index entry (incl. PK pointer) | Pages per million rows (InnoDB 16KB pages) |
|---|---|---|
CHAR(32) | ~40 | ~2,500 |
BINARY(16) | ~24 | ~1,500 |
Two BIGINT | ~24 | ~1,500 |
On a server with limited RAM, the smaller index stays hot in the buffer pool longer, which translates directly to fewer disk reads for index lookups. On modern servers with plentiful RAM the difference is negligible; on RDS micro-instances or cheap VPS it's real.
Hashes are random by design, so the B-tree insert pattern is worst-case for InnoDB — every insert lands in a random page, causing page splits and write amplification. This is the same problem random UUIDs have. If you're inserting millions of rows ordered by created_at, consider whether the hash needs to be the primary key (usually no) — make it a secondary index instead.
When MD5 is the wrong choice
MD5 has been cryptographically broken since 2004. Collisions can be generated in seconds on a laptop. That means MD5 is unsafe for:
- Password storage. Use
bcrypt,argon2id, orscryptvia your application's password library. Never store password MD5 (or SHA-256) directly. - Authentication tokens. Use cryptographically random tokens, not derived hashes.
- File integrity against tampering. Use SHA-256 or BLAKE3.
MD5 is still fine for:
- Content deduplication. "Are these two files probably the same?" — MD5 collisions can be deliberately constructed but won't happen by accident.
- ETags and cache keys. Where collision would just cause a cache miss, not a security failure.
- Legacy compatibility. Where an existing system stores MD5 and you can't change it without a migration.
For new tables that store hashes for any security-adjacent purpose, store the full digest: BINARY(32) for SHA-256, or BINARY(32) for BLAKE3's default 32-byte output. Do not truncate a security hash to 128 bits. A 128-bit digest only gives roughly 64-bit collision resistance because of the birthday bound, which is too weak to rely on against an adversary. Truncating to BINARY(16) is reasonable only for non-adversarial work: deduplication and cache keys where a collision is a harmless miss, not a security failure. For those, the same indexing approach applies with a more modern hash.
What to do next
- For the broader reference on every MySQL data type with exact byte sizes, see MySQL Data Types and Sizes: Complete Reference.
- For UUID-style identifiers (which have the same
BINARY(16) vs CHAR(36)trade-off as MD5 hashes), the comparison applies directly —BIN_TO_UUID()andUUID_TO_BIN()are the UUID equivalents ofHEX()andUNHEX(). - For the SQL syntax cheat sheet, MySQL Cheat Sheet covers conversion functions and DDL on one page.





