TechEarl

How to Store an MD5 Hash in PostgreSQL: bytea vs text

How to store an MD5 hash in PostgreSQL: the raw 16-byte bytea you should usually reach for, the readable text/char(32) hex string, and the uuid trick. Storage cost, encode()/decode() instead of HEX/UNHEX, and the pgcrypto digest() function.

Ishan Karunaratne⏱️ 14 min readUpdated
Share thisCopied
How to store an MD5 hash in PostgreSQL with storage trade-offs: the recommended 16-byte bytea, the readable text/char(32) hex string, and the uuid trick, plus encode()/decode() and the pgcrypto digest() function.

An MD5 hash is 128 bits, which is exactly 16 bytes raw or 32 hexadecimal characters as a string. PostgreSQL has no dedicated hash column type, so you pick how to represent those 16 bytes: bytea (the raw bytes, the compact one to reach for), text/char(32) (the readable hex string, double the size), or the clever uuid (also 16 bytes, since a 128-bit MD5 fits the type exactly). Below is the comparison table plus working DDL and insert/select for each, the md5() and digest() functions, and the conversion gotcha that trips up everyone arriving from MySQL.

Short answer: for a new table, store the raw 16 bytes in bytea and convert with decode(md5(value), 'hex') on insert and encode(col, 'hex') on read. Use text only when reading the hex string at the psql prompt is worth doubling the storage. And the big one: Postgres has no UNHEX/HEX functions, so if you came from the MySQL equivalent which uses BINARY(16), the muscle memory of UNHEX()/HEX() is exactly what you have to unlearn.

What is an MD5 hash and how is it stored in PostgreSQL?

An MD5 (Message-Digest 5) hash is a fixed-size 128-bit value produced by hashing arbitrary input. It is always 16 bytes in raw binary form, or 32 hexadecimal characters when stringified (0800fc577294c34e0b28ad2839435945 for md5('hash')). PostgreSQL has no native MD5 column type, so you choose how to represent those 16 bytes: as raw bytes in a bytea column (16 bytes per row, the compact default), as ASCII hex in a text or char(32) column (32 bytes per row, human-readable), or in the native uuid type, which is itself a 128-bit value and stores the digest in 16 bytes with a tidy display format.

PostgreSQL gives you two ways to compute the hash in the database. The built-in md5(text) function returns the lowercase 32-character hex string as text: md5('hash') gives 0800fc577294c34e0b28ad2839435945. The digest() function from the pgcrypto extension returns the raw 16 bytes directly as bytea: digest('hash', 'md5'). pgcrypto is the better starting point if you also want stronger algorithms later, since the same digest(data, type) call does sha256, sha512, and others by swapping the type argument. Either way, you can also compute the hash in your application and hand Postgres the finished value; the storage choices are identical.

The one thing to get straight before anything else: Postgres has no UNHEX or HEX. The equivalents are decode(hexstring, 'hex') (text to bytea, the UNHEX job) and encode(bytea, 'hex') (bytea back to a hex string, the HEX job). This is the single most common thing MySQL users get wrong on the way over, so it gets its own section below.

Jump to:

Storage comparison table

MethodBytes per rowHow to write / readReadabilityBest for
bytea16 (+ small varlena overhead)decode(md5(x),'hex') / encode(col,'hex')\x... displayRecommended default
text / char(32)32 (+ overhead)md5(x) / directDirect hex stringDebugging-heavy environments
uuid16md5(x)::uuid / ::textHyphenated UUIDCompact + clean display, with a caveat

The byte counts are the stored value itself; in practice bytea and text both carry a small varlena length header, while uuid is a flat fixed 16 bytes with no header. What the table shows cleanly is the relative cost: bytea and uuid are half the payload of the text hex string, and that 2× difference carries straight through to the index and the RAM spent keeping it hot in shared buffers. On a large table the compact forms win comfortably.

Store the raw 16 bytes. Half the size of the hex-string form, no charset questions, and it round-trips cleanly through encode/decode.

sql
CREATE TABLE md5_bytea (
  hash bytea NOT NULL
);

-- let Postgres hash and pack to 16 raw bytes in one step
INSERT INTO md5_bytea (hash) VALUES (decode(md5('hash'), 'hex'));

-- or supply an externally computed hex string
INSERT INTO md5_bytea (hash) VALUES (decode('0800fc577294c34e0b28ad2839435945', 'hex'));

SELECT encode(hash, 'hex') FROM md5_bytea;
-- 0800fc577294c34e0b28ad2839435945

Two helpers do all the conversion:

  • decode('hex_string', 'hex') packs the 32-char hex string to its 16-byte bytea form on insert (this is the UNHEX job).
  • encode(bytea_col, 'hex') expands it back to a lowercase hex string on read (the HEX job).

If you have pgcrypto enabled, digest('hash', 'md5') returns the 16-byte bytea directly, so you can skip the decode(md5(...)) wrapping:

sql
CREATE EXTENSION IF NOT EXISTS pgcrypto;
INSERT INTO md5_bytea (hash) VALUES (digest('hash', 'md5'));

For lookups by hash, decode the parameter so you compare bytes to bytes:

sql
SELECT * FROM md5_bytea WHERE hash = decode($1, 'hex');

One display quirk to expect: a raw SELECT hash FROM md5_bytea shows \x0800fc577294c34e0b28ad2839435945. The \x is Postgres's bytea output prefix, not part of the value. Wrap the column in encode(..., 'hex') whenever you want the bare hex string. In application code, work with bytes directly: bytes.fromhex(hex) in Python, Buffer.from(hex, 'hex') in Node.js, hex2bin($hex) in PHP. For most new tables, bytea is the right answer.

There is also a third-party route if you want to shave the last few bytes: the md5hash PGXN extension defines a native fixed 16-byte type stored inline, dropping even the small varlena header that bytea carries (roughly 60% of the text footprint). I would not pull in an extension just for that on a normal table, the bytea and uuid forms already get you to 16 bytes, but it exists and is worth knowing about if you are storing hashes by the hundred million and the per-row header genuinely adds up.

Method 2: text / char(32)

Store the hex string itself. Readable at the psql prompt, double the bytes.

sql
CREATE TABLE md5_text (
  hash char(32) NOT NULL
);

INSERT INTO md5_text (hash) VALUES (md5('hash'));

SELECT hash FROM md5_text;
-- 0800fc577294c34e0b28ad2839435945

md5() already returns the 32-character lowercase hex string, so there is no conversion on insert and no encode() on read. That readability is the only thing this method buys you, and it costs 32 bytes per row instead of 16. Use char(32) if you want the column documented as fixed-width, or text if you do not care; for a value that is always exactly 32 characters the practical difference between them is nil in Postgres (unlike MySQL, char(n) is not stored blank-padded on disk to a fixed width in a way that matters here, and text and varchar share the same storage). Reach for this method when you genuinely read these values by eye often enough that skipping the encode() wrap is worth the doubled storage; otherwise prefer bytea.

Method 3: the uuid trick

A 128-bit MD5 and a 128-bit uuid are the same width, and Postgres's uuid input accepts a plain 32-hex-digit string with no hyphens. So you can cast an MD5 straight into a uuid column and get a flat 16-byte store with a clean canonical display, for free:

sql
CREATE TABLE md5_uuid (
  hash uuid NOT NULL
);

INSERT INTO md5_uuid (hash) VALUES (md5('hash')::uuid);

SELECT hash FROM md5_uuid;
-- 0800fc57-7294-c34e-0b28-ad2839435945

What you get: a true fixed 16 bytes (no varlena header, so it is the leanest of the three on disk), and a readable hyphenated rendering without an encode() call. The uuid type also has efficient native comparison and indexing.

The caveat, and it matters: this value is not a real UUID. It carries no version or variant bits, it is just an MD5 relabelled. Anything downstream that assumes UUIDv4 randomness, parses the version nibble, or treats the column as a genuine identifier will be misled. It also reorders nothing, so it does not solve the random-insert problem (below). Treat the uuid trick as a storage-and-display convenience for a hash, document it as such in the schema, and do not let it leak into code paths that reason about UUID semantics. Used with that discipline it is a neat win; used carelessly it is a footgun.

encode and decode, not HEX and UNHEX

If you are coming from MySQL, this is the section to tattoo on the inside of your eyelids. MySQL stores an MD5 in BINARY(16) and converts with UNHEX() on insert and HEX() on read. PostgreSQL has neither function. The mapping is:

MySQLPostgreSQLDirection
UNHEX('hexstr')decode('hexstr', 'hex')hex string to raw bytes
HEX(bin_col)encode(bin_col, 'hex')raw bytes to hex string
BINARY(16)byteathe raw-bytes column type
MD5('x') (hex string)md5('x') (hex string)same name, same output

So the WHERE md5 = UNHEX(?) lookup you wrote in MySQL becomes WHERE hash = decode($1, 'hex') in Postgres, and SELECT HEX(md5) becomes SELECT encode(hash, 'hex'). The md5() function name is the one thing that carries over unchanged. Note also that encode(..., 'hex') emits lowercase hex, whereas MySQL's HEX() emits uppercase; decode() accepts either case on input, so it only bites you in string comparisons of the encoded output, not in round-tripping the bytes.

Worked schema

A typical use is making a long, otherwise-unindexable value searchable by an exact-match equality on its hash. Here is a bytea MD5 column on a table of URLs, with a unique index for dedup:

sql
CREATE EXTENSION IF NOT EXISTS pgcrypto;

CREATE TABLE urls (
  id        bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  url       text   NOT NULL,
  url_md5   bytea  NOT NULL,
  UNIQUE (url_md5)
);

-- insert: hash the URL to 16 raw bytes
INSERT INTO urls (url, url_md5)
VALUES ('https://example.com/very/long/path?with=a&long=query&string=here',
        digest('https://example.com/very/long/path?with=a&long=query&string=here', 'md5'));

-- exact-match lookup by the original value, comparing bytes to bytes
SELECT id, url
FROM urls
WHERE url_md5 = digest($1, 'md5');

-- read the stored hash back as a hex string when you need to see it
SELECT id, encode(url_md5, 'hex') AS md5_hex FROM urls;

If you would rather have Postgres keep url_md5 in sync automatically instead of passing it on every insert, make it a generated column: url_md5 bytea GENERATED ALWAYS AS (digest(url, 'md5')) STORED. Postgres only supports STORED generated columns (there is no VIRTUAL), so the 16 bytes are written into the row, but at 16 bytes that is cheap, and you never have to remember to compute the hash at the call site. Index it the same way for the equality lookup.

Indexing implications

All three storage methods take an ordinary B-tree index, and point-query performance (WHERE hash = ?) is essentially identical across them. Where they diverge is index size, and therefore how much of the index stays resident in shared buffers: the 16-byte bytea and uuid indexes are roughly half the leaf-payload of the 32-byte text/char(32) index, so on a RAM-constrained server they incur fewer disk reads.

MD5 digests are random by design, which is the worst case for a B-tree on insert: every new row lands in a random leaf page, causing page splits and write amplification (the same problem random UUIDs have, and the uuid trick does nothing to fix it). If the hash does not need to be the primary key, keep an IDENTITY bigint as the primary key and put the hash in a secondary or unique index, so the table's physical insert order stays sequential. On a table where you only ever look the hash up by equality, a B-tree is fine; a hash index buys nothing here because equality is exactly what a B-tree already does well.

When MD5 is the wrong choice

MD5 has been cryptographically broken since 2004; collisions can be generated in seconds. So MD5 is the wrong tool wherever an adversary controls the input.

For passwords, do not store an MD5 (or a bare SHA-256). Use a slow, salted, purpose-built hash: crypt() from pgcrypto with a bcrypt salt (crypt(password, gen_salt('bf'))), or hash with bcrypt/argon2id at the application layer and store the full output in a text column. A fast hash like MD5 lets an attacker who steals the table try billions of guesses per second on a GPU.

For integrity and tamper resistance (file checksums, content signing, audit hashes), use SHA-256. pgcrypto makes it a one-word change from the examples above: digest(value, 'sha256') returns a 32-byte bytea, stored in a bytea column exactly as the MD5 was. MD5 remains fine for non-adversarial work: content deduplication, ETags, cache keys, where a collision is a harmless miss rather than a security failure.

Use caseReach forStore in
Dedup, cache keys, ETags (non-adversarial)MD5bytea (16 bytes)
File integrity, content signing, auditSHA-256 via digest(x,'sha256')bytea (32 bytes)
Passwordscrypt() / bcrypt / argon2idtext

What to do next

FAQ

See also

Sources

Authoritative references this article was fact-checked against.

TagsPostgreSQLMD5Database StoragebyteapgcryptoSchema DesignHashing

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

How to Store an Array in PostgreSQL

PostgreSQL has native array types: any base type can be an array, declared with []. How to insert with the curly-brace literal or ARRAY[...], query with @>, &&, and ANY(), index with GIN, and when an array beats a junction table or jsonb.