Use the native JSON column type, not TEXT, whenever you actually query into the document. MySQL has had a real JSON data type since 5.7.8 (October 2015): it validates the document on insert (bad JSON is rejected), stores a parsed binary format so path lookups do not re-parse the string, and gives you operators like -> and ->> to pull values out. Reach for TEXT only when the JSON is an opaque blob you store and read back whole and never query into. Below are the differences, a comparison against normalizing into real columns, a worked schema with an indexed generated column, and the MariaDB gotcha that trips people up.
Short answer: attributes JSON for a column you query into; TEXT for a blob you only ever read whole. You cannot index a JSON column directly, so to make a path searchable, add a generated column over the path and index that. And know that MariaDB's JSON is not the same thing as MySQL's: it is an alias for LONGTEXT with a validity check, stored as text, not the binary format. More on that below.
Why the native JSON type beats TEXT
Before 5.7.8 the only way to put a JSON document in MySQL was to dump the string into a TEXT or VARCHAR column and parse it in your application. That still works, and for some cases it is still the right call. But the native JSON type buys you three things a text column cannot:
- Validation on insert. A
JSONcolumn rejects malformed input at write time.INSERT ... VALUES('[1, 2,')fails withERROR 3140 (22032): Invalid JSON text. ATEXTcolumn will happily store[1, 2,and you will not find out until something downstream tries to parse it. - A parsed binary format. MySQL converts the document to an internal binary representation on insert. That format is structured so the server can look up a key or array index directly without scanning the whole document, and reads do not re-parse the text every time. A
TEXTcolumn stores the raw string, so every read hands your application a string to parse from scratch. - Path operators.
JSON_EXTRACT(), plus the->operator (shorthand forJSON_EXTRACT(col, path)) and the->>operator (shorthand forJSON_UNQUOTE(JSON_EXTRACT(col, path)), which strips the surrounding quotes). You query into the document in SQL instead of pulling the whole blob into the app and parsing it there.
The cost is small. A JSON value is limited like LONGBLOB (up to about 4 GB, capped in practice by max_allowed_packet), and most of the bytes are stored off-page. The binary encoding carries some overhead, so a JSON column can sit a little larger on disk than the equivalent raw text. In exchange you get validation, in-place path access, and indexable extraction. For anything you query into, that is a trade worth making.
When TEXT is still the right choice
TEXT is the better pick when the JSON is opaque: you write it, you read it back whole, and you never ask the database about its contents. A cached API response, a serialized event payload you replay elsewhere, a third-party webhook body you keep for audit. There is nothing to validate against (you do not control the shape), nothing to index into (you never filter on a field), and TEXT skips the binary-encoding step so the insert is marginally cheaper. Parsing happens in the app, which is where the blob is going anyway.
The line is simple: do you ever write WHERE or ORDER BY against a field inside the document? If yes, JSON. If no, TEXT is fine and a touch leaner.
JSON vs normalizing into real columns
The harder question is not JSON versus TEXT. It is whether the data should be JSON at all, or whether it belongs in ordinary columns and tables.
If the shape is known and stable, normal columns are almost always better. You get typed columns, NOT NULL and foreign-key constraints, straightforward B-tree indexes, and clean joins. A product has a price, a SKU, and a weight: those are columns, not keys in a JSON blob. Burying stable fields in JSON throws away everything a relational database is good at and makes every query clumsier.
JSON earns its place when the data is genuinely sparse, dynamic, or ad-hoc: attributes that differ per row, optional metadata you cannot enumerate up front, a settings document whose keys you do not control. A products catalog where a book has author and pages, a shirt has size and color, and a cable has length and gauge is the textbook case. Modelling that as columns gives you a wide table full of NULLs, or an entity-attribute-value mess. A single attributes JSON column holds whatever each product needs.
My rule: stable, shared fields are columns; the long tail of per-row variation is one JSON column alongside them. You do not have to choose globally, you choose per field.
Comparison: JSON vs TEXT vs normalized columns
| Aspect | JSON type | TEXT | Normalized columns |
|---|---|---|---|
| Validation | Rejects invalid JSON on insert | None (stores any string) | Per-column types and constraints |
| Query operators | ->, ->>, JSON_EXTRACT, etc. | None (parse in the app) | Native SQL on every column |
| Indexing | Via a generated column or functional index | Prefix index only, useless for fields | Direct B-tree index per column |
| Best for | Sparse / dynamic data you query into | Opaque blobs read whole | Known, stable, shared fields |
A worked schema: products with a JSON attributes column
Here is a products table. The stable fields (sku, name, price) are real columns. The per-product variation lives in a single attributes JSON column.
CREATE TABLE products (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
sku VARCHAR(64) NOT NULL,
name VARCHAR(255) NOT NULL,
price DECIMAL(10,2) NOT NULL,
attributes JSON NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
UNIQUE KEY uq_products_sku (sku)
) ENGINE=InnoDB;Insert a document. MySQL validates it on the way in, so a typo in the JSON fails loudly here, not three reads later:
INSERT INTO products (sku, name, price, attributes)
VALUES (
'TSHIRT-RED-L',
'Cotton T-Shirt',
19.99,
'{"color": "red", "size": "L", "material": "cotton", "tags": ["summer", "sale"]}'
);Pull a single field out with the ->> operator. It returns the unquoted scalar, so you get red, not "red":
SELECT name, attributes->>'$.color' AS color
FROM products
WHERE attributes->>'$.size' = 'L';The $.color and $.size are JSON path expressions: $ is the document root, .color is the key. ->>'$.color' is exactly JSON_UNQUOTE(JSON_EXTRACT(attributes, '$.color')), just shorter to type. Use -> instead of ->> when you want the raw JSON form (quoted strings, intact arrays).
This is all standard on MariaDB too, as far as the SQL goes. JSON_EXTRACT, the -> and ->> operators, and JSON path syntax behave the same. What differs is the storage, and that matters enough to spell out.
Indexing a JSON column
You cannot put an index directly on a JSON column. A B-tree wants a single comparable value, and a whole JSON document is not that. The query above (WHERE attributes->>'$.size' = 'L') works, but with no index it scans every row and evaluates the extraction each time. To make a path searchable, you index the extracted value, and there are two ways.
Generated column (works on 5.7+ and MariaDB). Add a column defined as an extraction over the JSON path, then index that column:
ALTER TABLE products
ADD COLUMN size VARCHAR(8)
AS (attributes->>'$.size') STORED,
ADD INDEX idx_size (size);Now WHERE size = 'L' uses idx_size directly. The column can be VIRTUAL (computed on read, materialized only in the index records) or STORED (computed on write, kept in the row). I default to VIRTUAL for a sparse attribute so the row stays lean; STORED is worth it when the same value is read far more than written, or when you need it as part of a more complex index. Either way the optimizer will use the index for a WHERE clause that repeats the exact extraction expression, even without naming the generated column. This is the same generated-column trick used to index a long column by its hash, applied to a JSON path instead of a digest.
Functional index (MySQL 8.0.13+). From 8.0.13 you can index the expression directly, no separate column:
CREATE INDEX idx_size ON products ((CAST(attributes->>'$.size' AS CHAR(8))));And for arrays inside the document, multi-valued indexes (MySQL 8.0.17+) let one row contribute several index entries, so MEMBER OF and JSON_CONTAINS against an array field like $.tags can use an index. MariaDB does not have multi-valued indexes; there you index a single extracted scalar through the generated-column route above.
The MariaDB difference (do not gloss over this)
Here is the part that surprises people migrating between the two. MariaDB's JSON is not a native binary type. It is an alias for LONGTEXT (specifically LONGTEXT COLLATE utf8mb4_bin) with an automatic JSON_VALID() CHECK constraint bolted on for validation. MariaDB introduced it purely for MySQL compatibility, and it stores the document as text, parsing on access, with no compact binary representation.
So the SQL surface is largely compatible (the functions and operators line up), but the storage model is fundamentally different: MySQL keeps a parsed binary blob and reads subfields without re-parsing; MariaDB keeps the raw string and parses it on every access. The performance characteristics diverge accordingly, and binary-level features that depend on the encoding (and row-based replication of JSON columns between the two engines) do not carry over. If you are writing for both, treat MariaDB's JSON as "validated LONGTEXT" and do not assume the MySQL binary-format optimizations are there.
What to do next
- For exact byte sizes and limits of every column type, see MySQL data types and sizes.
- For the conversion and extraction functions in one place, the MySQL cheat sheet covers the JSON path operators alongside the rest of the DDL.
- For the same indexed generated-column pattern applied to a hash instead of a JSON path, see how to store a SHA-256 hash in MySQL.
- PostgreSQL handles this with
jsonb, a true binary type with GIN indexing rather than a generated-column workaround; see JSON vs JSONB in PostgreSQL.
FAQ
See also
- PostgreSQL takes a different route with a true binary type and GIN indexing: JSON vs JSONB in PostgreSQL.
- For when a document model is the better fit entirely, compare embedding versus referencing in MongoDB schemas.
- The companion to the JSON-vs-TEXT call: large TEXT vs BLOB in MySQL for the opaque-blob case.
- Exact byte sizes and storage limits for every column type live in the MySQL field types and sizes reference.
- The JSON path operators sit alongside the rest of the DDL in my MySQL cheat sheet.
- Spinning up a throwaway instance to test the schema: run MySQL in Docker.
Sources
Authoritative references this article was fact-checked against.
- MySQL 8.0 Reference Manual: The JSON Data Type (validation, binary storage, -> and ->> operators)dev.mysql.com
- MySQL 8.0 Reference Manual: Secondary Indexes and Generated Columnsdev.mysql.com
- MySQL 8.0 Reference Manual: CREATE INDEX (functional key parts, multi-valued indexes)dev.mysql.com
- MariaDB Knowledge Base: JSON Data Type (alias for LONGTEXT with a JSON_VALID CHECK)mariadb.com





