TechEarl

How to Store JSON in MySQL: The JSON Type vs TEXT

MySQL has a native JSON column type that validates on insert, stores a parsed binary format, and gives you path operators like ->>. When to reach for it over TEXT, how to index it with a generated column, and the MariaDB difference.

Ishan Karunaratne⏱️ 12 min readUpdated
Share thisCopied
How to store JSON in MySQL: the native JSON column type versus a TEXT column, validation on insert, the ->> path operator, and indexing JSON with a generated column.

Use the native JSON column type, not TEXT, whenever you actually query into the document. MySQL has had a real JSON data type since 5.7.8 (October 2015): it validates the document on insert (bad JSON is rejected), stores a parsed binary format so path lookups do not re-parse the string, and gives you operators like -> and ->> to pull values out. Reach for TEXT only when the JSON is an opaque blob you store and read back whole and never query into. Below are the differences, a comparison against normalizing into real columns, a worked schema with an indexed generated column, and the MariaDB gotcha that trips people up.

Short answer: attributes JSON for a column you query into; TEXT for a blob you only ever read whole. You cannot index a JSON column directly, so to make a path searchable, add a generated column over the path and index that. And know that MariaDB's JSON is not the same thing as MySQL's: it is an alias for LONGTEXT with a validity check, stored as text, not the binary format. More on that below.

Why the native JSON type beats TEXT

Before 5.7.8 the only way to put a JSON document in MySQL was to dump the string into a TEXT or VARCHAR column and parse it in your application. That still works, and for some cases it is still the right call. But the native JSON type buys you three things a text column cannot:

  • Validation on insert. A JSON column rejects malformed input at write time. INSERT ... VALUES('[1, 2,') fails with ERROR 3140 (22032): Invalid JSON text. A TEXT column will happily store [1, 2, and you will not find out until something downstream tries to parse it.
  • A parsed binary format. MySQL converts the document to an internal binary representation on insert. That format is structured so the server can look up a key or array index directly without scanning the whole document, and reads do not re-parse the text every time. A TEXT column stores the raw string, so every read hands your application a string to parse from scratch.
  • Path operators. JSON_EXTRACT(), plus the -> operator (shorthand for JSON_EXTRACT(col, path)) and the ->> operator (shorthand for JSON_UNQUOTE(JSON_EXTRACT(col, path)), which strips the surrounding quotes). You query into the document in SQL instead of pulling the whole blob into the app and parsing it there.

The cost is small. A JSON value is limited like LONGBLOB (up to about 4 GB, capped in practice by max_allowed_packet), and most of the bytes are stored off-page. The binary encoding carries some overhead, so a JSON column can sit a little larger on disk than the equivalent raw text. In exchange you get validation, in-place path access, and indexable extraction. For anything you query into, that is a trade worth making.

When TEXT is still the right choice

TEXT is the better pick when the JSON is opaque: you write it, you read it back whole, and you never ask the database about its contents. A cached API response, a serialized event payload you replay elsewhere, a third-party webhook body you keep for audit. There is nothing to validate against (you do not control the shape), nothing to index into (you never filter on a field), and TEXT skips the binary-encoding step so the insert is marginally cheaper. Parsing happens in the app, which is where the blob is going anyway.

The line is simple: do you ever write WHERE or ORDER BY against a field inside the document? If yes, JSON. If no, TEXT is fine and a touch leaner.

JSON vs normalizing into real columns

The harder question is not JSON versus TEXT. It is whether the data should be JSON at all, or whether it belongs in ordinary columns and tables.

If the shape is known and stable, normal columns are almost always better. You get typed columns, NOT NULL and foreign-key constraints, straightforward B-tree indexes, and clean joins. A product has a price, a SKU, and a weight: those are columns, not keys in a JSON blob. Burying stable fields in JSON throws away everything a relational database is good at and makes every query clumsier.

JSON earns its place when the data is genuinely sparse, dynamic, or ad-hoc: attributes that differ per row, optional metadata you cannot enumerate up front, a settings document whose keys you do not control. A products catalog where a book has author and pages, a shirt has size and color, and a cable has length and gauge is the textbook case. Modelling that as columns gives you a wide table full of NULLs, or an entity-attribute-value mess. A single attributes JSON column holds whatever each product needs.

My rule: stable, shared fields are columns; the long tail of per-row variation is one JSON column alongside them. You do not have to choose globally, you choose per field.

Comparison: JSON vs TEXT vs normalized columns

AspectJSON typeTEXTNormalized columns
ValidationRejects invalid JSON on insertNone (stores any string)Per-column types and constraints
Query operators->, ->>, JSON_EXTRACT, etc.None (parse in the app)Native SQL on every column
IndexingVia a generated column or functional indexPrefix index only, useless for fieldsDirect B-tree index per column
Best forSparse / dynamic data you query intoOpaque blobs read wholeKnown, stable, shared fields

A worked schema: products with a JSON attributes column

Here is a products table. The stable fields (sku, name, price) are real columns. The per-product variation lives in a single attributes JSON column.

sql
CREATE TABLE products (
  id         BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  sku        VARCHAR(64) NOT NULL,
  name       VARCHAR(255) NOT NULL,
  price      DECIMAL(10,2) NOT NULL,
  attributes JSON NOT NULL,
  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  UNIQUE KEY uq_products_sku (sku)
) ENGINE=InnoDB;

Insert a document. MySQL validates it on the way in, so a typo in the JSON fails loudly here, not three reads later:

sql
INSERT INTO products (sku, name, price, attributes)
VALUES (
  'TSHIRT-RED-L',
  'Cotton T-Shirt',
  19.99,
  '{"color": "red", "size": "L", "material": "cotton", "tags": ["summer", "sale"]}'
);

Pull a single field out with the ->> operator. It returns the unquoted scalar, so you get red, not "red":

sql
SELECT name, attributes->>'$.color' AS color
FROM products
WHERE attributes->>'$.size' = 'L';

The $.color and $.size are JSON path expressions: $ is the document root, .color is the key. ->>'$.color' is exactly JSON_UNQUOTE(JSON_EXTRACT(attributes, '$.color')), just shorter to type. Use -> instead of ->> when you want the raw JSON form (quoted strings, intact arrays).

This is all standard on MariaDB too, as far as the SQL goes. JSON_EXTRACT, the -> and ->> operators, and JSON path syntax behave the same. What differs is the storage, and that matters enough to spell out.

Indexing a JSON column

You cannot put an index directly on a JSON column. A B-tree wants a single comparable value, and a whole JSON document is not that. The query above (WHERE attributes->>'$.size' = 'L') works, but with no index it scans every row and evaluates the extraction each time. To make a path searchable, you index the extracted value, and there are two ways.

Generated column (works on 5.7+ and MariaDB). Add a column defined as an extraction over the JSON path, then index that column:

sql
ALTER TABLE products
  ADD COLUMN size VARCHAR(8)
    AS (attributes->>'$.size') STORED,
  ADD INDEX idx_size (size);

Now WHERE size = 'L' uses idx_size directly. The column can be VIRTUAL (computed on read, materialized only in the index records) or STORED (computed on write, kept in the row). I default to VIRTUAL for a sparse attribute so the row stays lean; STORED is worth it when the same value is read far more than written, or when you need it as part of a more complex index. Either way the optimizer will use the index for a WHERE clause that repeats the exact extraction expression, even without naming the generated column. This is the same generated-column trick used to index a long column by its hash, applied to a JSON path instead of a digest.

Functional index (MySQL 8.0.13+). From 8.0.13 you can index the expression directly, no separate column:

sql
CREATE INDEX idx_size ON products ((CAST(attributes->>'$.size' AS CHAR(8))));

And for arrays inside the document, multi-valued indexes (MySQL 8.0.17+) let one row contribute several index entries, so MEMBER OF and JSON_CONTAINS against an array field like $.tags can use an index. MariaDB does not have multi-valued indexes; there you index a single extracted scalar through the generated-column route above.

The MariaDB difference (do not gloss over this)

Here is the part that surprises people migrating between the two. MariaDB's JSON is not a native binary type. It is an alias for LONGTEXT (specifically LONGTEXT COLLATE utf8mb4_bin) with an automatic JSON_VALID() CHECK constraint bolted on for validation. MariaDB introduced it purely for MySQL compatibility, and it stores the document as text, parsing on access, with no compact binary representation.

So the SQL surface is largely compatible (the functions and operators line up), but the storage model is fundamentally different: MySQL keeps a parsed binary blob and reads subfields without re-parsing; MariaDB keeps the raw string and parses it on every access. The performance characteristics diverge accordingly, and binary-level features that depend on the encoding (and row-based replication of JSON columns between the two engines) do not carry over. If you are writing for both, treat MariaDB's JSON as "validated LONGTEXT" and do not assume the MySQL binary-format optimizations are there.

What to do next

FAQ

See also

Sources

Authoritative references this article was fact-checked against.

TagsMySQLJSONMariaDBSchema DesignGenerated ColumnsIndexingData Types

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

How to Store JSON in PostgreSQL: json vs jsonb

PostgreSQL has two JSON types. json keeps an exact text copy and reparses on every read; jsonb stores a decomposed binary format you can index with GIN. When to use each, the operators, and a worked products schema with a containment index.

How to Store Money in MySQL: DECIMAL vs Integer Cents

Never use FLOAT or DOUBLE for money. Use DECIMAL(19,4) for exact fixed-point amounts, or BIGINT integer cents for high-volume ledgers, and always store the ISO 4217 currency code separately. Comparison, worked schema, and the rounding rules.