Store an MD5 Hash in PostgreSQL: bytea vs text (2026)

An MD5 hash is 128 bits, which is exactly 16 bytes raw or 32 hexadecimal characters as a string. PostgreSQL has no dedicated hash column type, so you pick how to represent those 16 bytes: bytea (the raw bytes, the compact one to reach for), text/char(32) (the readable hex string, double the size), or the clever uuid (also 16 bytes, since a 128-bit MD5 fits the type exactly). Below is the comparison table plus working DDL and insert/select for each, the md5() and digest() functions, and the conversion gotcha that trips up everyone arriving from MySQL.

Short answer: for a new table, store the raw 16 bytes in bytea and convert with decode(md5(value), 'hex') on insert and encode(col, 'hex') on read. Use text only when reading the hex string at the psql prompt is worth doubling the storage. And the big one: Postgres has no UNHEX/HEX functions, so if you came from the MySQL equivalent which uses BINARY(16), the muscle memory of UNHEX()/HEX() is exactly what you have to unlearn.

What is an MD5 hash and how is it stored in PostgreSQL?

An MD5 (Message-Digest 5) hash is a fixed-size 128-bit value produced by hashing arbitrary input. It is always 16 bytes in raw binary form, or 32 hexadecimal characters when stringified (0800fc577294c34e0b28ad2839435945 for md5('hash')). PostgreSQL has no native MD5 column type, so you choose how to represent those 16 bytes: as raw bytes in a bytea column (16 bytes per row, the compact default), as ASCII hex in a text or char(32) column (32 bytes per row, human-readable), or in the native uuid type, which is itself a 128-bit value and stores the digest in 16 bytes with a tidy display format.

PostgreSQL gives you two ways to compute the hash in the database. The built-in md5(text) function returns the lowercase 32-character hex string as text: md5('hash') gives 0800fc577294c34e0b28ad2839435945. The digest() function from the pgcrypto extension returns the raw 16 bytes directly as bytea: digest('hash', 'md5'). pgcrypto is the better starting point if you also want stronger algorithms later, since the same digest(data, type) call does sha256, sha512, and others by swapping the type argument. Either way, you can also compute the hash in your application and hand Postgres the finished value; the storage choices are identical.

The one thing to get straight before anything else: Postgres has no UNHEX or HEX. The equivalents are decode(hexstring, 'hex') (text to bytea, the UNHEX job) and encode(bytea, 'hex') (bytea back to a hex string, the HEX job). This is the single most common thing MySQL users get wrong on the way over, so it gets its own section below.

Jump to:

Storage comparison table
Method 1: bytea
Method 2: text / char(32)
Method 3: the uuid trick
encode and decode, not HEX and UNHEX
Worked schema
Indexing implications
When MD5 is the wrong choice
FAQ

Storage comparison table

Method	Bytes per row	How to write / read	Readability	Best for
`bytea`	16 (+ small varlena overhead)	`decode(md5(x),'hex')` / `encode(col,'hex')`	`\x...` display	Recommended default
`text` / `char(32)`	32 (+ overhead)	`md5(x)` / direct	Direct hex string	Debugging-heavy environments
`uuid`	16	`md5(x)::uuid` / `::text`	Hyphenated UUID	Compact + clean display, with a caveat

The byte counts are the stored value itself; in practice bytea and text both carry a small varlena length header, while uuid is a flat fixed 16 bytes with no header. What the table shows cleanly is the relative cost: bytea and uuid are half the payload of the text hex string, and that 2× difference carries straight through to the index and the RAM spent keeping it hot in shared buffers. On a large table the compact forms win comfortably.

Method 1: bytea (recommended default)

Store the raw 16 bytes. Half the size of the hex-string form, no charset questions, and it round-trips cleanly through encode/decode.

sql

CREATE TABLE md5_bytea (
  hash bytea NOT NULL
);

-- let Postgres hash and pack to 16 raw bytes in one step
INSERT INTO md5_bytea (hash) VALUES (decode(md5('hash'), 'hex'));

-- or supply an externally computed hex string
INSERT INTO md5_bytea (hash) VALUES (decode('0800fc577294c34e0b28ad2839435945', 'hex'));

SELECT encode(hash, 'hex') FROM md5_bytea;
-- 0800fc577294c34e0b28ad2839435945

A live PostgreSQL session storing an MD5 hash as bytea, with the create statement and the real query output. — A live psql session running this schema in PostgreSQL 16: real output, not illustrative.

Two helpers do all the conversion:

decode('hex_string', 'hex') packs the 32-char hex string to its 16-byte bytea form on insert (this is the UNHEX job).
encode(bytea_col, 'hex') expands it back to a lowercase hex string on read (the HEX job).

If you have pgcrypto enabled, digest('hash', 'md5') returns the 16-byte bytea directly, so you can skip the decode(md5(...)) wrapping:

sql

CREATE EXTENSION IF NOT EXISTS pgcrypto;
INSERT INTO md5_bytea (hash) VALUES (digest('hash', 'md5'));

For lookups by hash, decode the parameter so you compare bytes to bytes:

sql

SELECT * FROM md5_bytea WHERE hash = decode($1, 'hex');

One display quirk to expect: a raw SELECT hash FROM md5_bytea shows \x0800fc577294c34e0b28ad2839435945. The \x is Postgres's bytea output prefix, not part of the value. Wrap the column in encode(..., 'hex') whenever you want the bare hex string. In application code, work with bytes directly: bytes.fromhex(hex) in Python, Buffer.from(hex, 'hex') in Node.js, hex2bin($hex) in PHP. For most new tables, bytea is the right answer.

There is also a third-party route if you want to shave the last few bytes: the md5hash PGXN extension defines a native fixed 16-byte type stored inline, dropping even the small varlena header that bytea carries. The extension's author benchmarks it at roughly 60% of the text footprint (a published 1-million-row test, not something I have re-measured myself). I would not pull in an extension just for that on a normal table, the bytea and uuid forms already get you to 16 bytes, but it exists and is worth knowing about if you are storing hashes by the hundred million and the per-row header genuinely adds up.

Method 2: text / char(32)

Store the hex string itself. Readable at the psql prompt, double the bytes.

sql

CREATE TABLE md5_text (
  hash char(32) NOT NULL
);

INSERT INTO md5_text (hash) VALUES (md5('hash'));

SELECT hash FROM md5_text;
-- 0800fc577294c34e0b28ad2839435945

md5() already returns the 32-character lowercase hex string, so there is no conversion on insert and no encode() on read. That readability is the only thing this method buys you, and it costs 32 bytes per row instead of 16. Use char(32) if you want the column documented as fixed-width, or text if you do not care; for a value that is always exactly 32 characters the practical difference between them is nil in Postgres (unlike MySQL, char(n) is not stored blank-padded on disk to a fixed width in a way that matters here, and text and varchar share the same storage). Reach for this method when you genuinely read these values by eye often enough that skipping the encode() wrap is worth the doubled storage; otherwise prefer bytea.

Method 3: the uuid trick

A 128-bit MD5 and a 128-bit uuid are the same width, and Postgres's uuid input accepts a plain 32-hex-digit string with no hyphens. So you can cast an MD5 straight into a uuid column and get a flat 16-byte store with a clean canonical display, for free:

sql

CREATE TABLE md5_uuid (
  hash uuid NOT NULL
);

INSERT INTO md5_uuid (hash) VALUES (md5('hash')::uuid);

SELECT hash FROM md5_uuid;
-- 0800fc57-7294-c34e-0b28-ad2839435945

psql showing md5('hash') as the 32-character hex string 0800fc577294c34e0b28ad2839435945, decode(md5('hash'),'hex') as the 16-byte bytea value, and md5('hash')::uuid as the hyphenated 0800fc57-7294-c34e-0b28-ad2839435945. — The same MD5 digest as a 32-char hex text value, the 16-byte bytea form from decode(), and the uuid cast. Real output from PostgreSQL 16.

What you get: a true fixed 16 bytes (no varlena header, so it is the leanest of the three on disk), and a readable hyphenated rendering without an encode() call. The uuid type also has efficient native comparison and indexing.

The caveat, and it matters: this value is not a real UUID. It carries no version or variant bits, it is just an MD5 relabelled. Anything downstream that assumes UUIDv4 randomness, parses the version nibble, or treats the column as a genuine identifier will be misled. It also reorders nothing, so it does not solve the random-insert problem (below). Treat the uuid trick as a storage-and-display convenience for a hash, document it as such in the schema, and do not let it leak into code paths that reason about UUID semantics. Used with that discipline it is a neat win; used carelessly it is a footgun.

encode and decode, not HEX and UNHEX

If you are coming from MySQL, this is the section to tattoo on the inside of your eyelids. MySQL stores an MD5 in BINARY(16) and converts with UNHEX() on insert and HEX() on read. PostgreSQL has neither function. The mapping is:

MySQL	PostgreSQL	Direction
`UNHEX('hexstr')`	`decode('hexstr', 'hex')`	hex string to raw bytes
`HEX(bin_col)`	`encode(bin_col, 'hex')`	raw bytes to hex string
`BINARY(16)`	`bytea`	the raw-bytes column type
`MD5('x')` (hex string)	`md5('x')` (hex string)	same name, same output

So the WHERE md5 = UNHEX(?) lookup you wrote in MySQL becomes WHERE hash = decode($1, 'hex') in Postgres, and SELECT HEX(md5) becomes SELECT encode(hash, 'hex'). The md5() function name is the one thing that carries over unchanged. Note also that encode(..., 'hex') emits lowercase hex, whereas MySQL's HEX() emits uppercase; decode() accepts either case on input, so it only bites you in string comparisons of the encoded output, not in round-tripping the bytes.

Worked schema

A typical use is making a long, otherwise-unindexable value searchable by an exact-match equality on its hash. Here is a bytea MD5 column on a table of URLs, with a unique index for dedup:

sql

CREATE EXTENSION IF NOT EXISTS pgcrypto;

CREATE TABLE urls (
  id        bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  url       text   NOT NULL,
  url_md5   bytea  NOT NULL,
  UNIQUE (url_md5)
);

-- insert: hash the URL to 16 raw bytes
INSERT INTO urls (url, url_md5)
VALUES ('https://example.com/very/long/path?with=a&long=query&string=here',
        digest('https://example.com/very/long/path?with=a&long=query&string=here', 'md5'));

-- exact-match lookup by the original value, comparing bytes to bytes
SELECT id, url
FROM urls
WHERE url_md5 = digest($1, 'md5');

-- read the stored hash back as a hex string when you need to see it
SELECT id, encode(url_md5, 'hex') AS md5_hex FROM urls;

If you would rather have Postgres keep url_md5 in sync automatically instead of passing it on every insert, make it a generated column: url_md5 bytea GENERATED ALWAYS AS (digest(url, 'md5')) STORED. Postgres only supports STORED generated columns (there is no VIRTUAL), so the 16 bytes are written into the row, but at 16 bytes that is cheap, and you never have to remember to compute the hash at the call site. Index it the same way for the equality lookup.

Indexing implications

All three storage methods take an ordinary B-tree index, and point-query performance (WHERE hash = ?) is essentially identical across them. Where they diverge is index size, and therefore how much of the index stays resident in shared buffers: the 16-byte bytea and uuid indexes are roughly half the leaf-payload of the 32-byte text/char(32) index, so on a RAM-constrained server they incur fewer disk reads.

MD5 digests are random by design, which is the worst case for a B-tree on insert: every new row lands in a random leaf page, causing page splits and write amplification (the same problem random UUIDs have, and the uuid trick does nothing to fix it). If the hash does not need to be the primary key, keep an IDENTITY bigint as the primary key and put the hash in a secondary or unique index, so the table's physical insert order stays sequential. On a table where you only ever look the hash up by equality, a B-tree is fine; a hash index buys nothing here because equality is exactly what a B-tree already does well.

When MD5 is the wrong choice

MD5 has been cryptographically broken since 2004; collisions can be generated in seconds. So MD5 is the wrong tool wherever an adversary controls the input.

For passwords, do not store an MD5 (or a bare SHA-256). Use a slow, salted, purpose-built hash: crypt() from pgcrypto with a bcrypt salt (crypt(password, gen_salt('bf'))), or hash with bcrypt/argon2id at the application layer and store the full output in a text column. A fast hash like MD5 lets an attacker who steals the table try billions of guesses per second on a GPU.

For integrity and tamper resistance (file checksums, content signing, audit hashes), use SHA-256. pgcrypto makes it a one-word change from the examples above: digest(value, 'sha256') returns a 32-byte bytea, stored in a bytea column exactly as the MD5 was (the same 32-byte storage decision plays out in MySQL, with BINARY(32) standing in for bytea). MD5 remains fine for non-adversarial work: content deduplication, ETags, cache keys, where a collision is a harmless miss rather than a security failure.

Use case	Reach for	Store in
Dedup, cache keys, ETags (non-adversarial)	MD5	`bytea` (16 bytes)
File integrity, content signing, audit	SHA-256 via `digest(x,'sha256')`	`bytea` (32 bytes)
Passwords	`crypt()` / bcrypt / argon2id	`text`

What to do next

For the MySQL side of the same problem, with the BINARY(16) / CHAR(32) / two-BIGINT breakdown, see How to Store MD5 Hashes in MySQL.
For the 32-byte digest and why fast hashes are wrong for passwords, see How to Store a SHA-256 Hash in MySQL; the SHA-256 storage logic is identical in Postgres with bytea instead of BINARY(32).

FAQ

bytea for new tables. An MD5 digest is 16 raw bytes, so bytea stores it exactly, at half the size of the 32-byte hex-string form in text or char(32). Insert with decode(md5(value), 'hex') (or digest(value, 'md5') if you have pgcrypto) and read back with encode(col, 'hex').

Use text/char(32) only if reading the hex string directly at the psql prompt is worth doubling the storage.

Postgres has no HEX or UNHEX. Use decode('hexstring', 'hex') in place of UNHEX() to turn a hex string into bytea, and encode(bytea_value, 'hex') in place of HEX() to turn bytes back into a hex string. One difference: encode() emits lowercase hex, while MySQL's HEX() emits uppercase. decode() accepts either case on input.

Two ways. The built-in md5('hash') returns the lowercase 32-character hex string as text. The digest('hash', 'md5') function from the pgcrypto extension returns the raw 16 bytes as bytea directly. To store the raw bytes from the built-in function, wrap it: decode(md5('hash'), 'hex').

Yes. A 128-bit MD5 is the same width as a uuid, and the uuid input accepts a 32-hex-digit string with no hyphens, so md5('x')::uuid works and stores in a flat 16 bytes with a clean hyphenated display. The caveat: the result is not a real UUID (no version or variant bits), so do not let it reach any code that parses or assumes UUID semantics. Treat it purely as a compact, readable hash store.

bytea on almost every count: 16 bytes versus 32, a smaller index, and lower pressure on shared buffers for large tables. text/char(32) only wins on raw readability, since md5() already produces the hex string and you can SELECT it without an encode() wrap. Past a few million rows the storage and memory savings make bytea the clear pick.

The leading \x is PostgreSQL's default bytea output format (the bytea_output setting is hex), not part of the stored value. The 16 bytes themselves are correct. When you want the bare 32-character hex string instead, wrap the column in encode(col, 'hex') on read, which strips the prefix and gives you 0800fc577294c34e0b28ad2839435945.

No. MD5 is cryptographically broken and far too fast to resist offline brute-force on modern hardware. Use a slow, salted password hash: crypt(password, gen_salt('bf')) from pgcrypto (bcrypt), or argon2id/bcrypt at the application layer, and store the full output in a text column. MD5 is for non-adversarial fingerprinting (dedup, cache keys), not credentials.

How to Store an MD5 Hash in PostgreSQL: bytea vs text

What is an MD5 hash and how is it stored in PostgreSQL?

Storage comparison table

Method 1: bytea (recommended default)

Method 2: text / char(32)

Method 3: the uuid trick

encode and decode, not HEX and UNHEX

Worked schema

Indexing implications

When MD5 is the wrong choice

What to do next

FAQ

See also

Sources

Ishan Karunaratne

Related posts

How to Store a UUID in PostgreSQL (the Native uuid Type)

How to Store an Array in PostgreSQL

How to Store an IP Address in PostgreSQL: inet and cidr

What column type should I use to store an MD5 hash in PostgreSQL?

What is the PostgreSQL equivalent of MySQL's HEX() and UNHEX()?

How do I generate an MD5 hash inside PostgreSQL?

Can I store an MD5 hash in a PostgreSQL uuid column?

Should I use bytea or text for an MD5 in PostgreSQL?

Why does my MD5 bytea column show up as \x0800fc... when I select it?

Is it safe to store passwords as MD5 hashes in PostgreSQL?

Sources

Ishan Karunaratne