Store MD5 Hashes in MySQL: CHAR(32), BINARY(16), or BIGINT (2026)

An MD5 hash is 128 bits, exactly 16 bytes raw, or 32 hexadecimal characters as a string. MySQL gives you four reasonable ways to store one, each with a different storage cost and indexing profile: CHAR(32) (the obvious one, 32 bytes), VARCHAR(32) (rarely the right call), BINARY(16) (the best general default, half the size of CHAR), and a split into two BIGINT UNSIGNED columns (also 16 bytes, tied with BINARY(16), but awkward to read). Below is the comparison table plus working DDL and insert/select for each, the indexing implications, and a note on when MD5 is the wrong tool entirely.

Short answer: for a new table, store the raw 16 bytes in BINARY(16) and convert with UNHEX() on insert and HEX() on read. Use CHAR(32) only when you debug at the SQL prompt enough that a readable hex string is worth doubling the storage. If your actual goal is to make a long TEXT/URL column searchable, you don't store the hash by hand at all, you let MySQL compute it in an indexed generated column.

What is an MD5 hash and how is it stored in MySQL?

An MD5 (Message-Digest 5) hash is a fixed-size 128-bit value produced by hashing arbitrary input. It's always 16 bytes in raw binary form, or 32 hexadecimal characters when stringified (0800fc577294c34e0b28ad2839435945 for MD5("hash")). MySQL has no native MD5 column type, so you choose how to represent those 16 bytes: as ASCII hex in a CHAR(32) or VARCHAR(32) column (32 bytes per row, human-readable), as raw bytes in a BINARY(16) column (16 bytes per row, half the storage and index size), or split across two BIGINT UNSIGNED columns (also 16 bytes per row, the same cost as BINARY(16), just harder to query). The right choice depends on whether you need readability for debugging, how indexable the column has to be at scale, and whether you ever need joins against the hash column.

MySQL's built-in MD5() function does the hashing itself: MD5('hash') returns the lowercase 32-character hex string 0800fc577294c34e0b28ad2839435945 (a VARCHAR in the connection's character set), which you then store in one of the columns below, or wrap in UNHEX() to get the raw 16 bytes. You can also compute the hash in your application and hand MySQL the finished value; the storage choices are identical either way.

Everything here applies to MariaDB as well: the column types, UNHEX()/HEX(), the MD5() function, and the indexing all behave the same, so the way you store an MD5 hash in MariaDB is the way you store it in MySQL (the indexed generated-column pattern below needs MariaDB 10.2+).

One MySQL-specific caveat if you generate the hash inside the database with the MD5() function rather than computing it in your application: when the MySQL server runs in FIPS mode, MD5() returns NULL, because MD5 is not a FIPS-approved algorithm. The storage methods below are unaffected (they store whatever 16 bytes you hand them), but a NULL coming back from MD5() will surface as an insert or constraint error. The examples in this article use externally supplied hex strings, which sidestep the issue entirely.

Jump to:

Storage comparison table
Method 1: CHAR(32)
Method 2: VARCHAR(32)
Method 3: BINARY(16)
Method 4: Two BIGINT UNSIGNED columns
Index a large column by its MD5 (generated columns)
Indexing implications
When MD5 is the wrong choice
FAQ

Storage comparison table

Method	Bytes per row	Raw hash bytes, 1B rows	Readability	Best for
`CHAR(32)`	32 (latin1 / ASCII)	~32 GB	Direct hex string	General use, debugging-heavy environments
`VARCHAR(32)`	33 (1-byte length prefix)	~33 GB	Direct hex string	Rarely the right pick
`BINARY(16)`	16	~16 GB	Need `HEX()` to read	Recommended default for high-volume tables
Two `BIGINT UNSIGNED`	16	~16 GB	Need `CONV()` to read	Niche; useful as an all-integer key pair

The "Raw hash bytes" column is the stored hash value alone, multiplied by a billion rows. It is not the full on-disk index size: an InnoDB secondary index entry also carries the primary-key value and per-record and per-page overhead, so the real index is larger (see the realistic per-entry figures in the index size section below). What the raw column does show cleanly is the relative cost: the 2× difference between BINARY(16) and CHAR(32) carries straight through to the real index. On a billion-row table that is roughly a 16 GB gap in the secondary index, plus less RAM spent keeping the index hot in the buffer pool.

Method 1: CHAR(32)

Most common, most readable. Store the hex string as ASCII.

sql

CREATE TABLE md5_char (
  md5 CHAR(32) NOT NULL,
  INDEX idx_md5 (md5)
) ENGINE=InnoDB;

INSERT INTO md5_char (md5) VALUES ('0800fc577294c34e0b28ad2839435945');

SELECT * FROM md5_char;
-- 0800fc577294c34e0b28ad2839435945

CHAR(32) reserves exactly 32 bytes per row in latin1 or ascii. Don't use a multibyte charset for hash columns, utf8mb4 would reserve 128 bytes per row for the same hex string. Add CHARACTER SET ascii COLLATE ascii_bin to the column if your table default is utf8mb4:

sql

md5 CHAR(32) CHARACTER SET ascii COLLATE ascii_bin NOT NULL

The binary collation makes comparison and indexing slightly faster and case-sensitive (lowercase hex won't match uppercase, which you want for hashes).

Method 2: VARCHAR(32)

Same content as CHAR(32) but with a 1-byte length prefix per row. Storage is 33 bytes per row.

sql

CREATE TABLE md5_varchar (
  md5 VARCHAR(32) NOT NULL,
  INDEX idx_md5 (md5)
) ENGINE=InnoDB;

There's almost never a reason to pick VARCHAR(32) over CHAR(32) for an MD5, the length is always exactly 32. The extra byte adds storage and gains nothing. The only legitimate use is if the column might later hold different fixed-length hashes (e.g., transitioning from MD5 to SHA-256 which is 64 hex chars), and even then a schema change is the cleaner answer.

Method 3: BINARY(16): recommended default

Half the storage of CHAR(32), half the index size, and no charset issues.

sql

CREATE TABLE md5_binary (
  md5 BINARY(16) NOT NULL,
  INDEX idx_md5 (md5)
) ENGINE=InnoDB;

INSERT INTO md5_binary (md5) VALUES (UNHEX('0800fc577294c34e0b28ad2839435945'));

SELECT HEX(md5) FROM md5_binary;
-- 0800FC577294C34E0B28AD2839435945

MySQL session storing an MD5 hash in a BINARY(16) column with UNHEX() on insert and HEX() on read, then a query proving BINARY(16) uses 16 bytes per row versus 32 for CHAR(32). — The BINARY(16) round-trip: UNHEX() on insert, HEX() on read, and LENGTH() confirming 16 bytes per row against CHAR(32)'s 32. Real output from MySQL 8.0.

Two helpers do all the conversion:

UNHEX('hex_string') → 16-byte binary value on insert.
HEX(binary_col) → uppercase hex string on read.

For lookups by hash, always wrap the parameter in UNHEX():

sql

SELECT * FROM md5_binary WHERE md5 = UNHEX(?);

In application code, store the hex string and let MySQL convert, or work with bytes directly (bytes.fromhex(hex_str) in Python, Buffer.from(hex, 'hex') in Node.js, pack('H*', $hex) in PHP).

For 99% of new tables, BINARY(16) is the right answer.

Method 4: Two BIGINT UNSIGNED columns

Split the 128 bits across two 64-bit unsigned integers. Storage is exactly 16 bytes, same as BINARY(16).

sql

CREATE TABLE md5_bigint (
  md5_high BIGINT UNSIGNED NOT NULL,
  md5_low  BIGINT UNSIGNED NOT NULL,
  INDEX idx_md5 (md5_high, md5_low)
) ENGINE=InnoDB;

INSERT INTO md5_bigint (md5_high, md5_low) VALUES (
  CONV(LEFT('0800fc577294c34e0b28ad2839435945', 16), 16, 10),
  CONV(RIGHT('0800fc577294c34e0b28ad2839435945', 16), 16, 10)
);

SELECT CONCAT(
  LPAD(CONV(md5_high, 10, 16), 16, '0'),
  LPAD(CONV(md5_low,  10, 16), 16, '0')
) AS md5 FROM md5_bigint;
-- 0800FC577294C34E0B28AD2839435945

The two-BIGINT method costs the same bytes as BINARY(16) but is more awkward to query and join. The one historical advantage, that some older MySQL clients had trouble with binary columns, no longer applies. Use BINARY(16) instead unless you have a specific reason (e.g., a partition key that requires integer columns).

Index a large column by its MD5 (generated columns)

The most common real reason to put an MD5 in MySQL is not to store the hash for its own sake, it is to make a long column searchable. You can't put a plain B-tree index on a TEXT column, and a prefix index on a long VARCHAR is useless when the values only diverge near the end (URLs with long query strings, file paths). The fix is a generated column holding MD5() of the long value, indexed for equality lookups. This is where you let the database compute the hash with its built-in MD5() function instead of supplying the hex by hand.

sql

CREATE TABLE urls (
  id       BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  url      TEXT NOT NULL,
  url_md5  BINARY(16) AS (UNHEX(MD5(url))) VIRTUAL,
  INDEX idx_url_md5 (url_md5)
) ENGINE=InnoDB;

INSERT INTO urls (url) VALUES ('https://example.com/very/long/path?with=a&long=query&string=here');

-- exact-match lookup by the original value
SELECT * FROM urls
WHERE url_md5 = UNHEX(MD5('https://example.com/very/long/path?with=a&long=query&string=here'));

You never write to url_md5; MySQL recomputes it from url. MD5(url) produces the 32-char hex string and UNHEX() packs it down to the 16-byte BINARY(16) form, so the index is the compact one from Method 3, not a 32-byte CHAR index.

Use VIRTUAL, not STORED. InnoDB indexes a virtual generated column fine: the hash is materialized in the secondary-index records only, never written into the row, so the table itself stays small. A STORED column writes the hash into the clustered index on every row for no extra benefit when all you need is the lookup. Adding the virtual column is a metadata-only change and adding or dropping its index is an in-place operation; adding a STORED generated column requires a full table rebuild.

You don't even have to name the generated column in the query. The optimizer will use the index whenever a WHERE, ORDER BY, or GROUP BY clause contains an expression that matches the column's definition exactly:

sql

-- also uses idx_url_md5, even though url_md5 is never named
SELECT * FROM urls WHERE UNHEX(MD5(url)) = UNHEX(MD5('https://example.com/...'));

MySQL session creating a VIRTUAL generated column url_md5 BINARY(16) AS (UNHEX(MD5(url))), the column auto-filling on insert, and an EXPLAIN showing the query uses key idx_url_md5 with type ref. — A VIRTUAL generated column hashes the URL automatically, and EXPLAIN confirms the equality lookup uses idx_url_md5 (type: ref). Real output from MySQL 8.0.

The match has to be exact (identical expression, same result type), so in practice referencing url_md5 directly is the clearer habit. One caveat carries over from earlier: because the value comes from MD5(), the column evaluates to NULL on a server running in FIPS mode (MD5 is not FIPS-approved), which then breaks the index lookups. If you need FIPS compliance, compute the hash in the application and store it in a plain BINARY(16) column instead.

Indexing implications

All four methods support B-tree indexes on the hash column. Lookup performance is essentially identical for point queries (WHERE md5 = ?). Where the methods diverge is index size and therefore buffer-pool efficiency:

Method	Bytes per index entry (incl. PK pointer)	Pages per million rows (InnoDB 16KB pages)
`CHAR(32)`	~40	~2,500
`BINARY(16)`	~24	~1,500
Two `BIGINT`	~24	~1,500

On a server with limited RAM, the smaller index stays hot in the buffer pool longer, which translates directly to fewer disk reads for index lookups. On modern servers with plentiful RAM the difference is negligible; on RDS micro-instances or cheap VPS it's real.

Hashes are random by design, so the B-tree insert pattern is worst-case for InnoDB, every insert lands in a random page, causing page splits and write amplification. This is the same problem random UUIDs have. If you're inserting millions of rows ordered by created_at, consider whether the hash needs to be the primary key (usually no): make it a secondary index instead.

When MD5 is the wrong choice

MD5 has been cryptographically broken since 2004. Collisions can be generated in seconds on a laptop. That means MD5 is unsafe for:

Password storage. Use bcrypt, argon2id, or scrypt via your application's password library, stored in a VARCHAR(255) password column. Never store password MD5 (or SHA-256) directly.
Authentication tokens. Use cryptographically random tokens, not derived hashes.
File integrity against tampering. Use SHA-256 or BLAKE3.

MD5 is still fine for:

Content deduplication. "Are these two files probably the same?", MD5 collisions can be deliberately constructed but won't happen by accident.
ETags and cache keys. Where collision would just cause a cache miss, not a security failure.
Legacy compatibility. Where an existing system stores MD5 and you can't change it without a migration.

For new tables that store hashes for any security-adjacent purpose, store the full digest: BINARY(32) for SHA-256 (see how to store a SHA-256 hash in MySQL for the full breakdown), or BINARY(32) for BLAKE3's default 32-byte output. Do not truncate a security hash to 128 bits. A 128-bit digest only gives roughly 64-bit collision resistance because of the birthday bound, which is too weak to rely on against an adversary. Truncating to BINARY(16) is reasonable only for non-adversarial work: deduplication and cache keys where a collision is a harmless miss, not a security failure. For those, the same indexing approach applies with a more modern hash.

What to do next

For the broader reference on every MySQL data type with exact byte sizes, see MySQL Data Types and Sizes: Complete Reference.
For UUID-style identifiers (which have the same BINARY(16) vs CHAR(36) trade-off as MD5 hashes), the comparison applies directly, BIN_TO_UUID() and UUID_TO_BIN() are the UUID equivalents of HEX() and UNHEX().
For the SQL syntax cheat sheet, MySQL Cheat Sheet covers conversion functions and DDL on one page.
On the other engine, storing an MD5 hash in PostgreSQL is the same 16-byte idea with bytea and decode()/encode() instead of BINARY(16) and UNHEX()/HEX().

FAQ

16 bytes per row, using either BINARY(16) or two BIGINT UNSIGNED columns. Both occupy the same disk and index space. BINARY(16) is the simpler, more idiomatic choice, query with UNHEX('hex_string') and HEX(column).

The popular CHAR(32) approach uses 32 bytes, exactly double, because it stores the hash as ASCII hex.

BINARY(16) for new tables. It's half the storage, half the index size, and otherwise functionally identical. Use UNHEX() on insert and HEX() on read for the hex-string form.

CHAR(32) is reasonable when you debug a lot at the SQL prompt and the readability of the hex string is worth the doubled storage. On large tables (10M+ rows) the storage and buffer-pool impact wins.

No. MD5 is cryptographically broken and far too fast on modern hardware to resist offline brute-force attacks. Use a slow purpose-built password hash: bcrypt, argon2id, or scrypt, applied at the application layer (PHP password_hash(), Python passlib, Node bcrypt).

Store the full output of those functions in a VARCHAR(255) column, the format already encodes the algorithm, cost, and salt.

A regular B-tree index works fine for any of the storage methods: INDEX idx_md5 (md5). Lookup performance is essentially identical for point queries.

If MD5 is the primary key, be aware that random hashes cause B-tree page splits on insert, hurting bulk-insert performance. For high-write tables, make MD5 a secondary index and use an auto-increment BIGINT as the primary key.

You can't B-tree index a TEXT column directly. Add an indexed generated column holding the MD5 of the long value: url_md5 BINARY(16) AS (UNHEX(MD5(url))) VIRTUAL with INDEX (url_md5), then look up with WHERE url_md5 = UNHEX(MD5(?)).

Use VIRTUAL, not STORED. InnoDB materializes the value in the secondary index, so the table stays small, and adding the column plus its index is an in-place change rather than a full table rebuild.

UNHEX('hex_string') converts a hex string to its raw binary form. HEX(binary_value) converts back. So:

INSERT INTO t (md5) VALUES (UNHEX('0800fc577294c34e0b28ad2839435945'));

SELECT HEX(md5) FROM t;

The same pattern works for SHA-1 (BINARY(20)) and SHA-256 (BINARY(32)).

For content-deduplication or cache-key use cases, no, the migration cost outweighs the benefit. For anything security-adjacent (file-integrity verification, audit hashes, content signing), yes, add a new BINARY(32) column for SHA-256, populate it for new rows, and backfill old rows lazily on next access.

For password storage that's currently MD5, force a password reset on next login (and store the new password as bcrypt or argon2id, not raw SHA-256).

How to Store MD5 Hashes in MySQL: CHAR, BINARY, and BIGINT Methods

What is an MD5 hash and how is it stored in MySQL?

Storage comparison table

Method 1: CHAR(32)

Method 2: VARCHAR(32)

Method 3: BINARY(16): recommended default

Method 4: Two BIGINT UNSIGNED columns

Index a large column by its MD5 (generated columns)

Indexing implications

When MD5 is the wrong choice

What to do next

FAQ

Sources

Ishan Karunaratne

Related posts

How to Store a SHA-256 Hash in MySQL: CHAR(64) vs BINARY(32)

How to Store Money in MySQL: DECIMAL vs Integer Cents

How to Store an MD5 Hash in PostgreSQL: bytea vs text

What is the smallest way to store an MD5 hash in MySQL?

Should I use CHAR(32) or BINARY(16) for MD5 in MySQL?

Is it safe to store passwords as MD5 hashes in MySQL?

How do I index an MD5 column in MySQL?

How do I index a long TEXT or URL column using MD5 in MySQL?

How do I convert between hex and binary MD5 in MySQL?

Should I migrate existing MD5 hashes to SHA-256?

Sources

Ishan Karunaratne