TechEarl

How to Store MD5 Hashes in MySQL: CHAR, BINARY, and BIGINT Methods

Four ways to store an MD5 hash in MySQL or MariaDB: CHAR(32), VARCHAR(32), BINARY(16), and the two-BIGINT split. Storage cost, index speed, and when to use each (plus when to use a stronger hash instead).

Ishan Karunaratne⏱️ 14 min readUpdated
Share thisCopied
Four ways to store an MD5 hash in MySQL with storage cost and index-speed trade-offs: CHAR(32), VARCHAR(32), BINARY(16), and the two-BIGINT split. Plus when to upgrade to SHA-256 or bcrypt.

An MD5 hash is 128 bits, exactly 16 bytes raw, or 32 hexadecimal characters as a string. MySQL gives you four reasonable ways to store one, each with a different storage cost and indexing profile: CHAR(32) (the obvious one, 32 bytes), VARCHAR(32) (rarely the right call), BINARY(16) (the best general default, half the size of CHAR), and a split into two BIGINT UNSIGNED columns (also 16 bytes, tied with BINARY(16), but awkward to read). Below is the comparison table plus working DDL and insert/select for each, the indexing implications, and a note on when MD5 is the wrong tool entirely.

Short answer: for a new table, store the raw 16 bytes in BINARY(16) and convert with UNHEX() on insert and HEX() on read. Use CHAR(32) only when you debug at the SQL prompt enough that a readable hex string is worth doubling the storage. If your actual goal is to make a long TEXT/URL column searchable, you don't store the hash by hand at all, you let MySQL compute it in an indexed generated column.

What is an MD5 hash and how is it stored in MySQL?

An MD5 (Message-Digest 5) hash is a fixed-size 128-bit value produced by hashing arbitrary input. It's always 16 bytes in raw binary form, or 32 hexadecimal characters when stringified (0800fc577294c34e0b28ad2839435945 for MD5("hash")). MySQL has no native MD5 column type, so you choose how to represent those 16 bytes: as ASCII hex in a CHAR(32) or VARCHAR(32) column (32 bytes per row, human-readable), as raw bytes in a BINARY(16) column (16 bytes per row, half the storage and index size), or split across two BIGINT UNSIGNED columns (also 16 bytes per row, the same cost as BINARY(16), just harder to query). The right choice depends on whether you need readability for debugging, how indexable the column has to be at scale, and whether you ever need joins against the hash column.

MySQL's built-in MD5() function does the hashing itself: MD5('hash') returns the lowercase 32-character hex string 0800fc577294c34e0b28ad2839435945 (a VARCHAR in the connection's character set), which you then store in one of the columns below, or wrap in UNHEX() to get the raw 16 bytes. You can also compute the hash in your application and hand MySQL the finished value; the storage choices are identical either way.

Everything here applies to MariaDB as well: the column types, UNHEX()/HEX(), the MD5() function, and the indexing all behave the same, so the way you store an MD5 hash in MariaDB is the way you store it in MySQL (the indexed generated-column pattern below needs MariaDB 10.2+).

One MySQL-specific caveat if you generate the hash inside the database with the MD5() function rather than computing it in your application: when the MySQL server runs in FIPS mode, MD5() returns NULL, because MD5 is not a FIPS-approved algorithm. The storage methods below are unaffected (they store whatever 16 bytes you hand them), but a NULL coming back from MD5() will surface as an insert or constraint error. The examples in this article use externally supplied hex strings, which sidestep the issue entirely.

Jump to:

Storage comparison table

MethodBytes per rowRaw hash bytes, 1B rowsReadabilityBest for
CHAR(32)32 (latin1 / ASCII)~32 GBDirect hex stringGeneral use, debugging-heavy environments
VARCHAR(32)33 (1-byte length prefix)~33 GBDirect hex stringRarely the right pick
BINARY(16)16~16 GBNeed HEX() to readRecommended default for high-volume tables
Two BIGINT UNSIGNED16~16 GBNeed CONV() to readNiche; useful as an all-integer key pair

The "Raw hash bytes" column is the stored hash value alone, multiplied by a billion rows. It is not the full on-disk index size: an InnoDB secondary index entry also carries the primary-key value and per-record and per-page overhead, so the real index is larger (see the realistic per-entry figures in the index size section below). What the raw column does show cleanly is the relative cost: the 2× difference between BINARY(16) and CHAR(32) carries straight through to the real index. On a billion-row table that is roughly a 16 GB gap in the secondary index, plus less RAM spent keeping the index hot in the buffer pool.

Method 1: CHAR(32)

Most common, most readable. Store the hex string as ASCII.

sql
CREATE TABLE md5_char (
  md5 CHAR(32) NOT NULL,
  INDEX idx_md5 (md5)
) ENGINE=InnoDB;

INSERT INTO md5_char (md5) VALUES ('0800fc577294c34e0b28ad2839435945');

SELECT * FROM md5_char;
-- 0800fc577294c34e0b28ad2839435945

CHAR(32) reserves exactly 32 bytes per row in latin1 or ascii. Don't use a multibyte charset for hash columns, utf8mb4 would reserve 128 bytes per row for the same hex string. Add CHARACTER SET ascii COLLATE ascii_bin to the column if your table default is utf8mb4:

sql
md5 CHAR(32) CHARACTER SET ascii COLLATE ascii_bin NOT NULL

The binary collation makes comparison and indexing slightly faster and case-sensitive (lowercase hex won't match uppercase, which you want for hashes).

Method 2: VARCHAR(32)

Same content as CHAR(32) but with a 1-byte length prefix per row. Storage is 33 bytes per row.

sql
CREATE TABLE md5_varchar (
  md5 VARCHAR(32) NOT NULL,
  INDEX idx_md5 (md5)
) ENGINE=InnoDB;

There's almost never a reason to pick VARCHAR(32) over CHAR(32) for an MD5, the length is always exactly 32. The extra byte adds storage and gains nothing. The only legitimate use is if the column might later hold different fixed-length hashes (e.g., transitioning from MD5 to SHA-256 which is 64 hex chars), and even then a schema change is the cleaner answer.

Half the storage of CHAR(32), half the index size, and no charset issues.

sql
CREATE TABLE md5_binary (
  md5 BINARY(16) NOT NULL,
  INDEX idx_md5 (md5)
) ENGINE=InnoDB;

INSERT INTO md5_binary (md5) VALUES (UNHEX('0800fc577294c34e0b28ad2839435945'));

SELECT HEX(md5) FROM md5_binary;
-- 0800FC577294C34E0B28AD2839435945
MySQL session storing an MD5 hash in a BINARY(16) column with UNHEX() on insert and HEX() on read, then a query proving BINARY(16) uses 16 bytes per row versus 32 for CHAR(32).
The BINARY(16) round-trip: UNHEX() on insert, HEX() on read, and LENGTH() confirming 16 bytes per row against CHAR(32)'s 32. Real output from MySQL 8.0.

Two helpers do all the conversion:

  • UNHEX('hex_string') → 16-byte binary value on insert.
  • HEX(binary_col) → uppercase hex string on read.

For lookups by hash, always wrap the parameter in UNHEX():

sql
SELECT * FROM md5_binary WHERE md5 = UNHEX(?);

In application code, store the hex string and let MySQL convert, or work with bytes directly (bytes.fromhex(hex_str) in Python, Buffer.from(hex, 'hex') in Node.js, pack('H*', $hex) in PHP).

For 99% of new tables, BINARY(16) is the right answer.

Method 4: Two BIGINT UNSIGNED columns

Split the 128 bits across two 64-bit unsigned integers. Storage is exactly 16 bytes, same as BINARY(16).

sql
CREATE TABLE md5_bigint (
  md5_high BIGINT UNSIGNED NOT NULL,
  md5_low  BIGINT UNSIGNED NOT NULL,
  INDEX idx_md5 (md5_high, md5_low)
) ENGINE=InnoDB;

INSERT INTO md5_bigint (md5_high, md5_low) VALUES (
  CONV(LEFT('0800fc577294c34e0b28ad2839435945', 16), 16, 10),
  CONV(RIGHT('0800fc577294c34e0b28ad2839435945', 16), 16, 10)
);

SELECT CONCAT(
  LPAD(CONV(md5_high, 10, 16), 16, '0'),
  LPAD(CONV(md5_low,  10, 16), 16, '0')
) AS md5 FROM md5_bigint;
-- 0800FC577294C34E0B28AD2839435945

The two-BIGINT method costs the same bytes as BINARY(16) but is more awkward to query and join. The one historical advantage, that some older MySQL clients had trouble with binary columns, no longer applies. Use BINARY(16) instead unless you have a specific reason (e.g., a partition key that requires integer columns).

Index a large column by its MD5 (generated columns)

The most common real reason to put an MD5 in MySQL is not to store the hash for its own sake, it is to make a long column searchable. You can't put a plain B-tree index on a TEXT column, and a prefix index on a long VARCHAR is useless when the values only diverge near the end (URLs with long query strings, file paths). The fix is a generated column holding MD5() of the long value, indexed for equality lookups. This is where you let the database compute the hash with its built-in MD5() function instead of supplying the hex by hand.

sql
CREATE TABLE urls (
  id       BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  url      TEXT NOT NULL,
  url_md5  BINARY(16) AS (UNHEX(MD5(url))) VIRTUAL,
  INDEX idx_url_md5 (url_md5)
) ENGINE=InnoDB;

INSERT INTO urls (url) VALUES ('https://example.com/very/long/path?with=a&long=query&string=here');

-- exact-match lookup by the original value
SELECT * FROM urls
WHERE url_md5 = UNHEX(MD5('https://example.com/very/long/path?with=a&long=query&string=here'));

You never write to url_md5; MySQL recomputes it from url. MD5(url) produces the 32-char hex string and UNHEX() packs it down to the 16-byte BINARY(16) form, so the index is the compact one from Method 3, not a 32-byte CHAR index.

Use VIRTUAL, not STORED. InnoDB indexes a virtual generated column fine: the hash is materialized in the secondary-index records only, never written into the row, so the table itself stays small. A STORED column writes the hash into the clustered index on every row for no extra benefit when all you need is the lookup. Adding the virtual column is a metadata-only change and adding or dropping its index is an in-place operation; adding a STORED generated column requires a full table rebuild.

You don't even have to name the generated column in the query. The optimizer will use the index whenever a WHERE, ORDER BY, or GROUP BY clause contains an expression that matches the column's definition exactly:

sql
-- also uses idx_url_md5, even though url_md5 is never named
SELECT * FROM urls WHERE UNHEX(MD5(url)) = UNHEX(MD5('https://example.com/...'));
MySQL session creating a VIRTUAL generated column url_md5 BINARY(16) AS (UNHEX(MD5(url))), the column auto-filling on insert, and an EXPLAIN showing the query uses key idx_url_md5 with type ref.
A VIRTUAL generated column hashes the URL automatically, and EXPLAIN confirms the equality lookup uses idx_url_md5 (type: ref). Real output from MySQL 8.0.

The match has to be exact (identical expression, same result type), so in practice referencing url_md5 directly is the clearer habit. One caveat carries over from earlier: because the value comes from MD5(), the column evaluates to NULL on a server running in FIPS mode (MD5 is not FIPS-approved), which then breaks the index lookups. If you need FIPS compliance, compute the hash in the application and store it in a plain BINARY(16) column instead.

Indexing implications

All four methods support B-tree indexes on the hash column. Lookup performance is essentially identical for point queries (WHERE md5 = ?). Where the methods diverge is index size and therefore buffer-pool efficiency:

MethodBytes per index entry (incl. PK pointer)Pages per million rows (InnoDB 16KB pages)
CHAR(32)~40~2,500
BINARY(16)~24~1,500
Two BIGINT~24~1,500

On a server with limited RAM, the smaller index stays hot in the buffer pool longer, which translates directly to fewer disk reads for index lookups. On modern servers with plentiful RAM the difference is negligible; on RDS micro-instances or cheap VPS it's real.

Hashes are random by design, so the B-tree insert pattern is worst-case for InnoDB, every insert lands in a random page, causing page splits and write amplification. This is the same problem random UUIDs have. If you're inserting millions of rows ordered by created_at, consider whether the hash needs to be the primary key (usually no): make it a secondary index instead.

When MD5 is the wrong choice

MD5 has been cryptographically broken since 2004. Collisions can be generated in seconds on a laptop. That means MD5 is unsafe for:

  • Password storage. Use bcrypt, argon2id, or scrypt via your application's password library, stored in a VARCHAR(255) password column. Never store password MD5 (or SHA-256) directly.
  • Authentication tokens. Use cryptographically random tokens, not derived hashes.
  • File integrity against tampering. Use SHA-256 or BLAKE3.

MD5 is still fine for:

  • Content deduplication. "Are these two files probably the same?", MD5 collisions can be deliberately constructed but won't happen by accident.
  • ETags and cache keys. Where collision would just cause a cache miss, not a security failure.
  • Legacy compatibility. Where an existing system stores MD5 and you can't change it without a migration.

For new tables that store hashes for any security-adjacent purpose, store the full digest: BINARY(32) for SHA-256 (see how to store a SHA-256 hash in MySQL for the full breakdown), or BINARY(32) for BLAKE3's default 32-byte output. Do not truncate a security hash to 128 bits. A 128-bit digest only gives roughly 64-bit collision resistance because of the birthday bound, which is too weak to rely on against an adversary. Truncating to BINARY(16) is reasonable only for non-adversarial work: deduplication and cache keys where a collision is a harmless miss, not a security failure. For those, the same indexing approach applies with a more modern hash.

What to do next

  • For the broader reference on every MySQL data type with exact byte sizes, see MySQL Data Types and Sizes: Complete Reference.
  • For UUID-style identifiers (which have the same BINARY(16) vs CHAR(36) trade-off as MD5 hashes), the comparison applies directly, BIN_TO_UUID() and UUID_TO_BIN() are the UUID equivalents of HEX() and UNHEX().
  • For the SQL syntax cheat sheet, MySQL Cheat Sheet covers conversion functions and DDL on one page.
  • On the other engine, storing an MD5 hash in PostgreSQL is the same 16-byte idea with bytea and decode()/encode() instead of BINARY(16) and UNHEX()/HEX().

FAQ

Sources

Authoritative references this article was fact-checked against.

TagsMySQLMD5Database StorageBINARYCHARSchema DesignHashing

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Software Systems Architect · Senior Software Engineer · Engineering Leadership

Software systems architect and senior software engineer with more than two decades designing, building, and running production software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Now a CTO, though what I write here is drawn from the full arc of that work, across architecture, engineering, and operations, not any single job.

Keep reading

Related posts

How to store money in MySQL: why FLOAT and DOUBLE drift, DECIMAL(19,4) for exact fixed-point amounts, BIGINT integer cents for high-volume ledgers, and a separate CHAR(3) ISO 4217 currency column.

How to Store Money in MySQL: DECIMAL vs Integer Cents

Never use FLOAT or DOUBLE for money. Use DECIMAL(19,4) for exact fixed-point amounts, or BIGINT integer cents for high-volume ledgers, and always store the ISO 4217 currency code separately. Comparison, worked schema, and the rounding rules.

How to store an MD5 hash in PostgreSQL with storage trade-offs: the recommended 16-byte bytea, the readable text/char(32) hex string, and the uuid trick, plus encode()/decode() and the pgcrypto digest() function.

How to Store an MD5 Hash in PostgreSQL: bytea vs text

How to store an MD5 hash in PostgreSQL: the raw 16-byte bytea you should usually reach for, the readable text/char(32) hex string, and the uuid trick. Storage cost, encode()/decode() instead of HEX/UNHEX, and the pgcrypto digest() function.