TechEarl

How to Store JSON in PostgreSQL: json vs jsonb

PostgreSQL has two JSON types. json keeps an exact text copy and reparses on every read; jsonb stores a decomposed binary format you can index with GIN. When to use each, the operators, and a worked products schema with a containment index.

Ishan Karunaratne⏱️ 12 min readUpdated
Share thisCopied
How to store JSON in PostgreSQL: the json type's exact text copy versus the jsonb decomposed binary format, the containment and key-existence operators, and indexing a jsonb column with a GIN index.

Use jsonb, not json, for almost anything you store in PostgreSQL. PostgreSQL has had two JSON column types since 9.4 (December 2014): json keeps an exact text copy of what you wrote (whitespace, key order, and duplicate keys all preserved) and reparses that string on every access, while jsonb stores a decomposed binary representation that processes faster, drops the cosmetic detail, and (this is the big one) can be indexed with a GIN index so containment and key-existence queries hit an index instead of scanning every row. Below are the differences, a comparison against normalizing into real columns, a worked schema with a GIN index, and how this compares to storing JSON in MySQL.

Short answer: attributes jsonb for a column you query into; json only when you must preserve the input text verbatim (exact whitespace, original key order, duplicate keys) and you only ever read the document whole. The reason jsonb wins almost every time is the index story: PostgreSQL can put a GIN index directly over the whole document, with no generated-column gymnastics. More on that below.

Why jsonb beats json

Both types accept and emit the same JSON, and both validate on insert (malformed JSON is rejected by either). The difference is entirely in how the value is stored and what you can do with it after.

  • json is the raw text. PostgreSQL stores the exact characters you sent. Semantically-insignificant whitespace survives, the order of keys in each object is preserved, and if you wrote the same key twice both copies are kept (functions that process it treat the last value as the live one). Every operation that reaches into the document has to reparse the text from scratch.
  • jsonb is a parsed binary tree. On insert PostgreSQL decomposes the document into an internal binary format. Whitespace is gone, object keys are reordered (and stored sorted), and duplicate keys are collapsed to the last value. Reads do not reparse: the server walks the binary structure directly, so pulling a field out is cheap.

The conversion to binary makes inserts marginally more expensive and the stored value can be a touch larger than the raw text, but everything you do afterwards is faster. And only jsonb supports the indexing and containment operators that make a JSON column actually queryable at scale. That is the trade, and for data you query into it is not close.

When json is the right choice

json earns its place in exactly one situation: you need to give back byte-for-byte what you stored. If the exact formatting matters (you are caching a third-party API response and a downstream consumer checks a signature over the raw bytes), or the original key order is semantically meaningful to some other system, or you genuinely need to round-trip duplicate keys, then jsonb will quietly rewrite the document and json will not.

That is a narrow case. A cached payload you only ever read whole, where formatting is load-bearing, is the honest use for json. The one other place it can pay off is a write-heavy, append-only sink (think a raw event or log table you ingest fast and rarely query), where skipping the decompose-to-binary step on every insert shaves real cost. The moment you want to filter on a field inside the document, or index it, you want jsonb.

jsonb vs normalizing into real columns

The harder question is not json versus jsonb. It is whether the data should be JSON at all, or whether it belongs in ordinary columns.

If the shape is known and stable, normal columns are almost always better. You get typed columns, NOT NULL and foreign-key constraints, plain B-tree indexes, and clean joins. A product has a price, a SKU, and a weight: those are columns, not keys in a JSON blob. Burying stable fields in JSON throws away everything a relational database is good at.

jsonb earns its place when the data is genuinely sparse, dynamic, or ad-hoc: attributes that differ per row, optional metadata you cannot enumerate up front, a settings document whose keys you do not control. A products catalog where a book has author and pages, a shirt has size and color, and a cable has length and gauge is the textbook case. Modelling that as columns gives you a wide table full of nulls. A single attributes jsonb column holds whatever each row needs.

My rule: stable, shared fields are columns; the long tail of per-row variation is one jsonb column alongside them. You choose per field, not globally.

Comparison: json vs jsonb vs normalized columns

AspectjsonjsonbNormalized columns
StorageExact text copyDecomposed binaryTyped per-column values
Preserves text/key order/dupesYes, verbatimNo (reordered, dupes dropped)N/A
ReadsReparse on every accessWalk binary, no reparseNative column access
IndexingNone usefulGIN over the whole documentB-tree per column
Best forVerbatim round-trip, read wholeSparse / dynamic data you queryKnown, stable, shared fields

A worked schema: products with a jsonb attributes column

Here is a products table. The stable fields (sku, name, price) are real columns. The per-product variation lives in a single attributes jsonb column.

sql
CREATE TABLE products (
  id         bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  sku        text NOT NULL UNIQUE,
  name       text NOT NULL,
  price      numeric(10,2) NOT NULL,
  attributes jsonb NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now()
);

Insert a document. PostgreSQL validates it on the way in and decomposes it to the binary form, so a typo in the JSON fails loudly here, not three reads later:

sql
INSERT INTO products (sku, name, price, attributes)
VALUES (
  'TSHIRT-RED-L',
  'Cotton T-Shirt',
  19.99,
  '{"color": "red", "size": "L", "material": "cotton", "tags": ["summer", "sale"]}'
);

Pull a single field out with the ->> operator. It returns the value as text (unquoted), so you get red, not "red":

sql
SELECT name, attributes->>'color' AS color
FROM products
WHERE attributes->>'size' = 'L';

Here -> returns a field as jsonb (strings stay quoted, arrays stay intact) and ->> returns it as text. For a nested path, #> and #>> take a path array, for example attributes#>>'{dimensions,length}'. Those operators work fine on their own, but the WHERE above with no index scans every row. That is what GIN fixes.

Indexing a jsonb column with GIN

This is where PostgreSQL pulls decisively ahead of MySQL. You can put a GIN index directly on the whole jsonb column and it indexes every key and value inside the document:

sql
CREATE INDEX idx_products_attributes
  ON products USING GIN (attributes);

That single index now serves containment (@>) and key-existence (?, ?|, ?&) queries against any field in the document. The containment query "find products whose attributes contain this fragment" uses the index:

sql
-- products that are red size L, index-backed
SELECT name FROM products
WHERE attributes @> '{"color": "red", "size": "L"}';

@> asks "does the left document contain the right one as a subset?" The key-existence operators ask whether a top-level key is present: attributes ? 'material' (this key exists), attributes ?| array['color','colour'] (any of these), attributes ?& array['size','color'] (all of these). All four ride the same default GIN index.

If you only ever run containment queries and never the ? family, use the jsonb_path_ops operator class instead. It indexes only the @> (and jsonpath) path, producing a smaller, faster index:

sql
CREATE INDEX idx_products_attributes_path
  ON products USING GIN (attributes jsonb_path_ops);

For richer querying, PostgreSQL 12 added the SQL/JSON path language: the @? and @@ operators and functions like jsonb_path_query, for example attributes @? '$.tags[*] ? (@ == "sale")'. Those also use the GIN index. They are worth reaching for when a flat containment check is not expressive enough.

How this compares to MySQL

If you have indexed JSON in MySQL, the contrast is the whole point. MySQL's JSON type is also a binary format, and it also gives you path operators (its -> and ->> use $.path syntax). But MySQL cannot index a JSON column directly: to make a path searchable you add a generated column over that one path and index that, or use a functional index on the extraction expression. You index one path at a time, by hand.

PostgreSQL's GIN index indexes the entire document at once. One CREATE INDEX ... USING GIN (attributes) covers containment and key-existence against every field, including keys you had not thought to index when you wrote the DDL. That is a genuinely more powerful model for sparse, unpredictable data, and it is the main reason I default to jsonb in PostgreSQL where I would think harder about JSON-vs-columns in MySQL. For the MySQL side of this trade in full, see storing JSON in MySQL with the native JSON type.

What to do next

  • For the MySQL equivalent and its generated-column indexing route, see How to Store JSON in MySQL.
  • When the data is just a list of scalars (tags, role names) rather than a nested document, a native PostgreSQL array column is often a better fit than jsonb, with its own GIN-indexed containment operators.
  • For the broader schema-design judgement call on JSON versus real columns, the same normalize-the-stable-shape rule applies across engines.

FAQ

See also

Sources

Authoritative references this article was fact-checked against.

TagsPostgreSQLJSONJSONBGIN IndexSchema DesignIndexingData Types

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

How to Store JSON in MySQL: The JSON Type vs TEXT

MySQL has a native JSON column type that validates on insert, stores a parsed binary format, and gives you path operators like ->>. When to reach for it over TEXT, how to index it with a generated column, and the MariaDB difference.

How to Store an Array in PostgreSQL

PostgreSQL has native array types: any base type can be an array, declared with []. How to insert with the curly-brace literal or ARRAY[...], query with @>, &&, and ANY(), index with GIN, and when an array beats a junction table or jsonb.