TechEarl

MySQL ENUM vs a Lookup Table: Which Should You Use?

MySQL ENUM is compact and fast, but adding a value is a schema change and there is nowhere to hang metadata. When to use ENUM, when to use a lookup table with a foreign key, and where VARCHAR + CHECK fits.

Ishan Karunaratne⏱️ 13 min readUpdated
Share thisCopied
MySQL ENUM vs a lookup table: how ENUM stores values as a 1- or 2-byte index, why adding a value is a schema change, and when a lookup table with a foreign key or a VARCHAR + CHECK constraint is the better model.

ENUM stores one value from a fixed list, and it stores it cheaply: internally it is a 1-byte integer index (up to 255 members) or a 2-byte index (up to 65,535 members), not the string you wrote. So it is compact and fast. The catch is that the list lives in the column definition, so adding a value is an ALTER TABLE, there is nowhere to attach a label or a sort order or an is_active flag, and the values sort by definition order rather than alphabetically. A lookup table (a small statuses table with an id and a name, referenced by a foreign key) fixes all of that at the cost of a JOIN. Below is the decision, a comparison table, and worked schemas for both, plus where VARCHAR + a CHECK constraint sits between them.

Short answer: use ENUM for a tiny, stable, app-internal set that almost never changes and needs no per-value metadata (a 3-value status, a yes/no flag you refuse to model as boolean). Use a lookup table the moment the set will grow at runtime, needs metadata, or is shared across tables. If you only need the constraint and not the metadata, a VARCHAR with a CHECK (MySQL 8.0.16+, MariaDB 10.2+) gives you validation without the ENUM ALTER pain. This is the same data-modeling instinct as picking the right MySQL field type and size up front instead of migrating later.

How MySQL ENUM actually stores values

An ENUM column looks like it stores strings, but it does not. Each member gets an index number starting at 1, and the row stores that integer index, not the text. ENUM('pending','paid','shipped') stores 1, 2, or 3 in one byte. The string only appears when you SELECT it, because MySQL maps the index back to the literal. A column with up to 255 members costs 1 byte per row; past that, up to 65,535 members, it costs 2 bytes. That is the whole appeal: a status column that would be VARCHAR(20) becomes a single byte, and equality filters run against a tiny integer.

That index-based storage is also the source of most ENUM surprises, and they are worth knowing before you commit to it:

  • Sorting is by index, not alphabetical. ORDER BY status on ENUM('pending','paid','shipped') returns rows in definition order (pending, paid, shipped), not alphabetical order. The MySQL manual is explicit: 'b' sorts before 'a' for ENUM('b', 'a'). To sort alphabetically you have to write ORDER BY CAST(status AS CHAR), which throws away the index speed. People hit this constantly and assume the data is corrupt.
  • An invalid value becomes the empty string in non-strict mode. Insert a value that is not in the list and, outside strict SQL mode, MySQL stores '' (index 0) as a special error marker instead of rejecting the row. In strict mode (the default since MySQL 5.7) it errors, which is what you want, but the silent-'' behavior still bites anyone on a loosened sql_mode.
  • Numeric-looking members are a trap. With ENUM('0','1','2'), inserting 2 (unquoted) is read as the index 2, which is the value '1'. Inserting '2' (quoted) matches the member '2'. The MySQL manual flatly recommends against numbers as enumeration values for exactly this reason. If your set looks like numbers, ENUM is the wrong tool.

The real problem: adding a value is a schema change

Here is the operational cost that decides most of these debates. To add a value to an ENUM, you run ALTER TABLE ... MODIFY. On MySQL 8.0, appending a member to the end of the list can be ALGORITHM=INSTANT or in-place, as long as the storage size does not change (per the online-DDL docs):

sql
ALTER TABLE orders MODIFY status ENUM('pending','paid','shipped','refunded'), ALGORITHM=INSTANT;

That is genuinely cheap. But the moment you need to insert a value in the middle of the list, or reorder for sane sorting, MySQL has to renumber the existing members, and that forces a full table copy. Crossing the 255-member boundary (1 byte to 2 bytes) also forces a copy. So "just add a value" is sometimes instant and sometimes a rebuild of a large table, and you have to know which case you are in before you run it on production. Either way it is a DDL statement, a deploy, a migration in your schema history. You cannot let an admin add a value from the application.

A lookup table turns that schema change into a plain INSERT. Adding refunded is one row in a statuses table, runnable at runtime by anything that can write to the table. That single difference is why I reach for a lookup table whenever the set is not frozen.

ENUM vs lookup table vs VARCHAR + CHECK

TraitENUMLookup table + FKVARCHAR + CHECK
Storage per row1 byte (≤255) or 2 bytesFK column (e.g. TINYINT/INT)full string bytes
Adding a valueALTER TABLE (instant only when appending; rebuild otherwise)INSERT one row, no DDLALTER TABLE to change the CHECK
Per-value metadataNoneYes (label, sort order, is_active, color, …)None
Referential integrityDefinition-enforcedReal FK constraintCHECK-enforced
Shared across tablesNo (redefine per column)Yes (one table, many FKs)No (repeat the CHECK)
SortingBy index order, not alphabeticalAny column you like (e.g. sort_order)Alphabetical / natural string sort
ReadsNo joinNeeds a JOIN for the labelNo join
Best forTiny, stable, internal set, no metadataGrowing or shared set, needs metadataSimple constrained set, no metadata, no join

The table is the whole argument in miniature: ENUM wins on read simplicity and bytes, the lookup table wins on everything that involves change and metadata, and VARCHAR + CHECK is the lightweight middle when you want validation without a second table or a join.

Worked example: the ENUM version

Start with the obvious ENUM. An orders table with a three-state status:

sql
CREATE TABLE orders (
  id         BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  status     ENUM('pending','paid','shipped') NOT NULL DEFAULT 'pending',
  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB;

INSERT INTO orders (status) VALUES ('paid');

SELECT status FROM orders WHERE status = 'paid';

One byte per row, no join, readable at the SQL prompt. This is ENUM at its best: the set is small, it is internal to the application, and it changes rarely. If it stays this way, ENUM is the correct answer and a lookup table would be over-engineering. The trouble starts the day product asks for a refunded state, and then a partially_refunded state, and then wants each status to render with a specific color and ordering in the dashboard. None of that fits in the column definition.

Worked example: the same thing as a lookup table

Model the statuses as their own table, with room for the metadata the ENUM had nowhere to put, and reference it by foreign key:

sql
CREATE TABLE order_statuses (
  id         TINYINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name       VARCHAR(40) NOT NULL,
  label      VARCHAR(80) NOT NULL,
  sort_order SMALLINT NOT NULL DEFAULT 0,
  is_active  TINYINT(1) NOT NULL DEFAULT 1,
  UNIQUE KEY uq_order_statuses_name (name)
) ENGINE=InnoDB;

INSERT INTO order_statuses (name, label, sort_order) VALUES
  ('pending', 'Awaiting payment', 10),
  ('paid',    'Paid',             20),
  ('shipped', 'Shipped',          30);

CREATE TABLE orders (
  id         BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  status_id  TINYINT UNSIGNED NOT NULL,
  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT fk_orders_status
    FOREIGN KEY (status_id) REFERENCES order_statuses (id)
) ENGINE=InnoDB;

Now the read costs a join, but everything that used to be painful is trivial:

sql
-- read with the human label
SELECT o.id, s.name, s.label
FROM orders o
JOIN order_statuses s ON s.id = o.status_id
ORDER BY s.sort_order;

-- add a value: an INSERT, not an ALTER
INSERT INTO order_statuses (name, label, sort_order) VALUES ('refunded', 'Refunded', 40);

That last line is the payoff. Adding refunded did not touch the schema, did not lock or copy the orders table, and could be done from an admin screen. The join is cheap: InnoDB automatically indexes the foreign-key column (status_id), so the lookup is an index probe into a tiny table, not a scan. The sort_order column gives you stable ordering without leaning on definition order, is_active lets you retire a status without deleting historical rows that reference it, and the foreign key gives you genuine referential integrity (you cannot set an orders.status_id that has no matching status). The same order_statuses table can back any other table that needs the same vocabulary, which ENUM can never do, every ENUM is redefined column by column.

The middle ground: VARCHAR + CHECK

If the set is constrained but you do not need metadata and do not want a join on every read, a plain string column with a CHECK constraint is the lightest option. MySQL enforces CHECK constraints as of 8.0.16 (before that it parsed and silently ignored them); MariaDB has enforced them since 10.2:

sql
CREATE TABLE orders (
  id     BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  status VARCHAR(20) NOT NULL DEFAULT 'pending',
  CONSTRAINT chk_orders_status
    CHECK (status IN ('pending','paid','shipped'))
) ENGINE=InnoDB;

You get the readable string, no join, alphabetical sorting that behaves, and a real rejection on bad input. The cost over ENUM is storage (the actual string bytes instead of one index byte) and the fact that changing the allowed set is still an ALTER TABLE to redefine the constraint. So it solves the ENUM sorting and storage-portability complaints but not the "add a value without DDL" one. I use it when the set is genuinely fixed but I want the column to be self-explanatory to anyone reading the data, without the ENUM index footguns above.

Everything here works on MariaDB too

ENUM, the 1/2-byte index storage, the foreign-key lookup-table pattern, and CHECK constraints all behave the same on MariaDB, so the decision is identical on either engine. The only version note worth carrying: enforced CHECK constraints arrived in MariaDB 10.2 and MySQL 8.0.16, so if you are stuck on an older MySQL 5.7 server, the VARCHAR + CHECK middle ground is off the table and the choice narrows back to ENUM versus a lookup table.

What to do next

  • For the exact byte sizes of ENUM, TINYINT, VARCHAR, and every other column type, see MySQL data types and sizes.
  • For the FOREIGN KEY syntax, JOIN patterns, and DDL on one page, the MySQL cheat sheet is the quick reference.
  • For the two-state special case of a constrained value (a yes/no flag), the type choice narrows to storing a boolean in MySQL.

FAQ

See also

Sources

Authoritative references this article was fact-checked against.

TagsMySQLENUMLookup TableForeign KeySchema DesignCHECK ConstraintData Modeling

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

find vs locate vs mlocate: Which File Search Tool to Use

find walks the live filesystem every time it runs: always current, sometimes slow. locate queries a prebuilt database: instant, but stale until the next updatedb. This breaks down the locate family (mlocate, plocate, slocate), the macOS situation, and exactly when to reach for each one.

grep vs ripgrep vs ag: Which Search Tool to Use

grep is on every system and searches exactly what you point it at. ripgrep (rg) is the fast Rust-based default for code search: it skips .gitignore'd, hidden, and binary files unless told otherwise. ag (the_silver_searcher) was the older fast-grep, now largely superseded by ripgrep. This breaks down speed, defaults, regex engines, and exactly when to reach for each one.