An MD5 hash is 128 bits, which is exactly 16 bytes raw or 32 hexadecimal characters as a string. PostgreSQL has no dedicated hash column type, so you pick how to represent those 16 bytes: bytea (the raw bytes, the compact one to reach for), text/char(32) (the readable hex string, double the size), or the clever uuid (also 16 bytes, since a 128-bit MD5 fits the type exactly). Below is the comparison table plus working DDL and insert/select for each, the md5() and digest() functions, and the conversion gotcha that trips up everyone arriving from MySQL.
Short answer: for a new table, store the raw 16 bytes in bytea and convert with decode(md5(value), 'hex') on insert and encode(col, 'hex') on read. Use text only when reading the hex string at the psql prompt is worth doubling the storage. And the big one: Postgres has no UNHEX/HEX functions, so if you came from the MySQL equivalent which uses BINARY(16), the muscle memory of UNHEX()/HEX() is exactly what you have to unlearn.
What is an MD5 hash and how is it stored in PostgreSQL?
An MD5 (Message-Digest 5) hash is a fixed-size 128-bit value produced by hashing arbitrary input. It is always 16 bytes in raw binary form, or 32 hexadecimal characters when stringified (0800fc577294c34e0b28ad2839435945 for md5('hash')). PostgreSQL has no native MD5 column type, so you choose how to represent those 16 bytes: as raw bytes in a bytea column (16 bytes per row, the compact default), as ASCII hex in a text or char(32) column (32 bytes per row, human-readable), or in the native uuid type, which is itself a 128-bit value and stores the digest in 16 bytes with a tidy display format.
PostgreSQL gives you two ways to compute the hash in the database. The built-in md5(text) function returns the lowercase 32-character hex string as text: md5('hash') gives 0800fc577294c34e0b28ad2839435945. The digest() function from the pgcrypto extension returns the raw 16 bytes directly as bytea: digest('hash', 'md5'). pgcrypto is the better starting point if you also want stronger algorithms later, since the same digest(data, type) call does sha256, sha512, and others by swapping the type argument. Either way, you can also compute the hash in your application and hand Postgres the finished value; the storage choices are identical.
The one thing to get straight before anything else: Postgres has no UNHEX or HEX. The equivalents are decode(hexstring, 'hex') (text to bytea, the UNHEX job) and encode(bytea, 'hex') (bytea back to a hex string, the HEX job). This is the single most common thing MySQL users get wrong on the way over, so it gets its own section below.
Jump to:
- Storage comparison table
- Method 1: bytea
- Method 2: text / char(32)
- Method 3: the uuid trick
- encode and decode, not HEX and UNHEX
- Worked schema
- Indexing implications
- When MD5 is the wrong choice
- FAQ
Storage comparison table
| Method | Bytes per row | How to write / read | Readability | Best for |
|---|---|---|---|---|
bytea | 16 (+ small varlena overhead) | decode(md5(x),'hex') / encode(col,'hex') | \x... display | Recommended default |
text / char(32) | 32 (+ overhead) | md5(x) / direct | Direct hex string | Debugging-heavy environments |
uuid | 16 | md5(x)::uuid / ::text | Hyphenated UUID | Compact + clean display, with a caveat |
The byte counts are the stored value itself; in practice bytea and text both carry a small varlena length header, while uuid is a flat fixed 16 bytes with no header. What the table shows cleanly is the relative cost: bytea and uuid are half the payload of the text hex string, and that 2× difference carries straight through to the index and the RAM spent keeping it hot in shared buffers. On a large table the compact forms win comfortably.
Method 1: bytea (recommended default)
Store the raw 16 bytes. Half the size of the hex-string form, no charset questions, and it round-trips cleanly through encode/decode.
CREATE TABLE md5_bytea (
hash bytea NOT NULL
);
-- let Postgres hash and pack to 16 raw bytes in one step
INSERT INTO md5_bytea (hash) VALUES (decode(md5('hash'), 'hex'));
-- or supply an externally computed hex string
INSERT INTO md5_bytea (hash) VALUES (decode('0800fc577294c34e0b28ad2839435945', 'hex'));
SELECT encode(hash, 'hex') FROM md5_bytea;
-- 0800fc577294c34e0b28ad2839435945Two helpers do all the conversion:
decode('hex_string', 'hex')packs the 32-char hex string to its 16-bytebyteaform on insert (this is theUNHEXjob).encode(bytea_col, 'hex')expands it back to a lowercase hex string on read (theHEXjob).
If you have pgcrypto enabled, digest('hash', 'md5') returns the 16-byte bytea directly, so you can skip the decode(md5(...)) wrapping:
CREATE EXTENSION IF NOT EXISTS pgcrypto;
INSERT INTO md5_bytea (hash) VALUES (digest('hash', 'md5'));For lookups by hash, decode the parameter so you compare bytes to bytes:
SELECT * FROM md5_bytea WHERE hash = decode($1, 'hex');One display quirk to expect: a raw SELECT hash FROM md5_bytea shows \x0800fc577294c34e0b28ad2839435945. The \x is Postgres's bytea output prefix, not part of the value. Wrap the column in encode(..., 'hex') whenever you want the bare hex string. In application code, work with bytes directly: bytes.fromhex(hex) in Python, Buffer.from(hex, 'hex') in Node.js, hex2bin($hex) in PHP. For most new tables, bytea is the right answer.
There is also a third-party route if you want to shave the last few bytes: the md5hash PGXN extension defines a native fixed 16-byte type stored inline, dropping even the small varlena header that bytea carries (roughly 60% of the text footprint). I would not pull in an extension just for that on a normal table, the bytea and uuid forms already get you to 16 bytes, but it exists and is worth knowing about if you are storing hashes by the hundred million and the per-row header genuinely adds up.
Method 2: text / char(32)
Store the hex string itself. Readable at the psql prompt, double the bytes.
CREATE TABLE md5_text (
hash char(32) NOT NULL
);
INSERT INTO md5_text (hash) VALUES (md5('hash'));
SELECT hash FROM md5_text;
-- 0800fc577294c34e0b28ad2839435945md5() already returns the 32-character lowercase hex string, so there is no conversion on insert and no encode() on read. That readability is the only thing this method buys you, and it costs 32 bytes per row instead of 16. Use char(32) if you want the column documented as fixed-width, or text if you do not care; for a value that is always exactly 32 characters the practical difference between them is nil in Postgres (unlike MySQL, char(n) is not stored blank-padded on disk to a fixed width in a way that matters here, and text and varchar share the same storage). Reach for this method when you genuinely read these values by eye often enough that skipping the encode() wrap is worth the doubled storage; otherwise prefer bytea.
Method 3: the uuid trick
A 128-bit MD5 and a 128-bit uuid are the same width, and Postgres's uuid input accepts a plain 32-hex-digit string with no hyphens. So you can cast an MD5 straight into a uuid column and get a flat 16-byte store with a clean canonical display, for free:
CREATE TABLE md5_uuid (
hash uuid NOT NULL
);
INSERT INTO md5_uuid (hash) VALUES (md5('hash')::uuid);
SELECT hash FROM md5_uuid;
-- 0800fc57-7294-c34e-0b28-ad2839435945What you get: a true fixed 16 bytes (no varlena header, so it is the leanest of the three on disk), and a readable hyphenated rendering without an encode() call. The uuid type also has efficient native comparison and indexing.
The caveat, and it matters: this value is not a real UUID. It carries no version or variant bits, it is just an MD5 relabelled. Anything downstream that assumes UUIDv4 randomness, parses the version nibble, or treats the column as a genuine identifier will be misled. It also reorders nothing, so it does not solve the random-insert problem (below). Treat the uuid trick as a storage-and-display convenience for a hash, document it as such in the schema, and do not let it leak into code paths that reason about UUID semantics. Used with that discipline it is a neat win; used carelessly it is a footgun.
encode and decode, not HEX and UNHEX
If you are coming from MySQL, this is the section to tattoo on the inside of your eyelids. MySQL stores an MD5 in BINARY(16) and converts with UNHEX() on insert and HEX() on read. PostgreSQL has neither function. The mapping is:
| MySQL | PostgreSQL | Direction |
|---|---|---|
UNHEX('hexstr') | decode('hexstr', 'hex') | hex string to raw bytes |
HEX(bin_col) | encode(bin_col, 'hex') | raw bytes to hex string |
BINARY(16) | bytea | the raw-bytes column type |
MD5('x') (hex string) | md5('x') (hex string) | same name, same output |
So the WHERE md5 = UNHEX(?) lookup you wrote in MySQL becomes WHERE hash = decode($1, 'hex') in Postgres, and SELECT HEX(md5) becomes SELECT encode(hash, 'hex'). The md5() function name is the one thing that carries over unchanged. Note also that encode(..., 'hex') emits lowercase hex, whereas MySQL's HEX() emits uppercase; decode() accepts either case on input, so it only bites you in string comparisons of the encoded output, not in round-tripping the bytes.
Worked schema
A typical use is making a long, otherwise-unindexable value searchable by an exact-match equality on its hash. Here is a bytea MD5 column on a table of URLs, with a unique index for dedup:
CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE TABLE urls (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
url text NOT NULL,
url_md5 bytea NOT NULL,
UNIQUE (url_md5)
);
-- insert: hash the URL to 16 raw bytes
INSERT INTO urls (url, url_md5)
VALUES ('https://example.com/very/long/path?with=a&long=query&string=here',
digest('https://example.com/very/long/path?with=a&long=query&string=here', 'md5'));
-- exact-match lookup by the original value, comparing bytes to bytes
SELECT id, url
FROM urls
WHERE url_md5 = digest($1, 'md5');
-- read the stored hash back as a hex string when you need to see it
SELECT id, encode(url_md5, 'hex') AS md5_hex FROM urls;If you would rather have Postgres keep url_md5 in sync automatically instead of passing it on every insert, make it a generated column: url_md5 bytea GENERATED ALWAYS AS (digest(url, 'md5')) STORED. Postgres only supports STORED generated columns (there is no VIRTUAL), so the 16 bytes are written into the row, but at 16 bytes that is cheap, and you never have to remember to compute the hash at the call site. Index it the same way for the equality lookup.
Indexing implications
All three storage methods take an ordinary B-tree index, and point-query performance (WHERE hash = ?) is essentially identical across them. Where they diverge is index size, and therefore how much of the index stays resident in shared buffers: the 16-byte bytea and uuid indexes are roughly half the leaf-payload of the 32-byte text/char(32) index, so on a RAM-constrained server they incur fewer disk reads.
MD5 digests are random by design, which is the worst case for a B-tree on insert: every new row lands in a random leaf page, causing page splits and write amplification (the same problem random UUIDs have, and the uuid trick does nothing to fix it). If the hash does not need to be the primary key, keep an IDENTITY bigint as the primary key and put the hash in a secondary or unique index, so the table's physical insert order stays sequential. On a table where you only ever look the hash up by equality, a B-tree is fine; a hash index buys nothing here because equality is exactly what a B-tree already does well.
When MD5 is the wrong choice
MD5 has been cryptographically broken since 2004; collisions can be generated in seconds. So MD5 is the wrong tool wherever an adversary controls the input.
For passwords, do not store an MD5 (or a bare SHA-256). Use a slow, salted, purpose-built hash: crypt() from pgcrypto with a bcrypt salt (crypt(password, gen_salt('bf'))), or hash with bcrypt/argon2id at the application layer and store the full output in a text column. A fast hash like MD5 lets an attacker who steals the table try billions of guesses per second on a GPU.
For integrity and tamper resistance (file checksums, content signing, audit hashes), use SHA-256. pgcrypto makes it a one-word change from the examples above: digest(value, 'sha256') returns a 32-byte bytea, stored in a bytea column exactly as the MD5 was. MD5 remains fine for non-adversarial work: content deduplication, ETags, cache keys, where a collision is a harmless miss rather than a security failure.
| Use case | Reach for | Store in |
|---|---|---|
| Dedup, cache keys, ETags (non-adversarial) | MD5 | bytea (16 bytes) |
| File integrity, content signing, audit | SHA-256 via digest(x,'sha256') | bytea (32 bytes) |
| Passwords | crypt() / bcrypt / argon2id | text |
What to do next
- For the MySQL side of the same problem, with the
BINARY(16)/CHAR(32)/ two-BIGINT breakdown, see How to Store MD5 Hashes in MySQL. - For the 32-byte digest and why fast hashes are wrong for passwords, see How to Store a SHA-256 Hash in MySQL; the SHA-256 storage logic is identical in Postgres with
byteainstead ofBINARY(32).
FAQ
See also
- Storing an MD5 hash in MySQL, the
BINARY(16)/CHAR(32)version of this same decision, including theHEX()/UNHEX()conversions Postgres replaces withencode/decode. - Storing a SHA-256 hash in MySQL, the 32-byte upgrade for any integrity check where MD5's collisions matter; the Postgres version is identical with
byteain place ofBINARY(32). - Storing a UUID in PostgreSQL, the proper home for the native
uuidtype the trick above borrows, and why a real UUID is not an MD5. - Storing JSON and JSONB in PostgreSQL, another "which column type" call where the binary form usually wins on size and lookup cost.
- MySQL field types and sizes, a byte-by-byte reference for the binary, char, and integer columns these hash schemas lean on.
- Running PostgreSQL in Docker, a throwaway instance to try the
pgcryptoanddigest()examples here without touching a real database.
Sources
Authoritative references this article was fact-checked against.
- PostgreSQL Documentation: String Functions and Operators (md5)postgresql.org
- PostgreSQL Documentation: Binary String Functions (encode, decode, bytea hex format)postgresql.org
- PostgreSQL Documentation: pgcrypto (digest)postgresql.org
- PostgreSQL Documentation: UUID Typepostgresql.org
- PostgreSQL Documentation: Binary Data Types (bytea)postgresql.org





