MySQL has 30+ built-in data types across seven families: numeric, string, date/time, JSON, spatial, binary, and (in MySQL 9.0+) vector. Picking the right one matters at scale because smaller fixed-size columns mean smaller indexes, fewer I/O pages, and faster joins. This is the complete reference I keep open whenever I design a new schema: exact storage size in bytes, value range, and a one-line "when to use it" for every type. It's current to MySQL 9.x (May 2026).
What are MySQL data types?
A MySQL data type is the storage format for a column. It controls how many bytes each row spends on that column, what values are valid, and how MySQL compares and sorts them. Choosing well affects four things at once: disk usage (smaller types pack more rows per page), index performance (B-tree depth scales with row width), query correctness (TIMESTAMP wraps at 2038, INT overflows at ~2.1 billion), and application code (TEXT can't have a default value, JSON validates on insert). Use the smallest type that fits the actual range of values you'll store, prefer fixed-width over variable-width when sizes are predictable, and reach for VECTOR only with MySQL 9.0+ where the embedding workload genuinely needs in-database similarity search.
Every MySQL data type by storage size
The fastest way to size a schema is to read the byte cost straight off a list. The smallest MySQL types take 1 byte (TINYINT, BOOL, YEAR); the largest fixed types take 8 bytes (BIGINT, DOUBLE); and the variable-length TEXT/BLOB family runs from 256 bytes up to 4 GB, stored off-page. Here is every built-in type ordered smallest to largest, with the exact storage and what it holds:
| Type | Storage (bytes) | Holds |
|---|---|---|
TINYINT / BOOL | 1 | Integer -128 to 127 (0 to 255 unsigned). BOOL is an alias for TINYINT(1). |
YEAR | 1 | Year 1901 to 2155 |
ENUM(...) | 1 to 2 | One of up to 65,535 predefined labels |
SMALLINT | 2 | Integer -32,768 to 32,767 (0 to 65,535 unsigned) |
MEDIUMINT | 3 | Integer -8.4M to 8.4M (0 to 16.7M unsigned) |
DATE | 3 | 1000-01-01 to 9999-12-31 |
TIME | 3 (+0 to 3 fractional) | -838:59:59 to 838:59:59 |
INT / INTEGER | 4 | Integer -2.1B to 2.1B (0 to 4.29B unsigned) |
FLOAT | 4 | Approximate float, ~7 significant digits |
TIMESTAMP | 4 (+0 to 3 fractional) | 1970-01-01 to 2038-01-19 UTC |
DATETIME | 5 (+0 to 3 fractional) | 1000-01-01 to 9999-12-31, no timezone |
BIGINT | 8 | Integer -9.2×10^18 to 9.2×10^18 |
DOUBLE / REAL | 8 | Approximate float, ~15 significant digits |
BIT(M) | (M+7)/8 (1 to 8) | Bitfield of 1 to 64 bits |
SET(...) | 1, 2, 3, 4, or 8 | Subset of up to 64 labels, stored as a bitmask |
DECIMAL(M,D) | ≈ M/2 + 1 | Exact fixed-point number |
CHAR(M) | M × charset bytes | Fixed-length string, 0 to 255 chars |
VARCHAR(M) | length + 1 to 2 | Variable string up to the 65,535-byte row limit |
TINYTEXT / TINYBLOB | length + 1 | Up to 255 bytes |
TEXT / BLOB | length + 2 | Up to 65,535 bytes (64 KB) |
MEDIUMTEXT / MEDIUMBLOB | length + 3 | Up to 16,777,215 bytes (16 MB) |
LONGTEXT / LONGBLOB | length + 4 | Up to 4,294,967,295 bytes (4 GB) |
JSON | ≈ document + small overhead | Binary-encoded JSON, bulk stored off-page |
POINT / GEOMETRY / … | 25+ | Spatial geometry (SRID + WKB) |
VECTOR(N) | 4 × N + header | N single-precision floats (MySQL 9.0+) |
The exact ranges, charset effects, and per-family caveats are broken out in the sections below. The headline numbers above are payload-only; InnoDB row overhead adds a fixed per-row cost on top.
Jump to:
- Every type by storage size
- Numeric data types
- String data types
- Date and time data types
- JSON data type
- VECTOR data type (MySQL 9.0+)
- Spatial data types
- Binary data types
- CHAR vs VARCHAR: when to use which
- DATETIME vs TIMESTAMP: when to use which
- INT vs BIGINT: picking the right integer size
- UUID storage: BINARY(16) vs CHAR(36)
- Worked example: the right type for every column
- NDB storage engine considerations
- Notes on storage overhead
- FAQ
Numeric data types
| Type | Storage | Signed range | Unsigned range |
|---|---|---|---|
TINYINT | 1 byte | -128 to 127 | 0 to 255 |
SMALLINT | 2 bytes | -32,768 to 32,767 | 0 to 65,535 |
MEDIUMINT | 3 bytes | -8,388,608 to 8,388,607 | 0 to 16,777,215 |
INT / INTEGER | 4 bytes | -2,147,483,648 to 2,147,483,647 | 0 to 4,294,967,295 |
BIGINT | 8 bytes | -9.22×10^18 to 9.22×10^18 | 0 to 1.84×10^19 |
DECIMAL(M,D) / NUMERIC | ≈ M/2 + 1 bytes | fixed-point, exact precision | , |
FLOAT | 4 bytes | ±3.4×10^38, ~7 decimal digits | , |
DOUBLE / REAL | 8 bytes | ±1.8×10^308, ~15 decimal digits | , |
BIT(M) | (M+7)/8 bytes | 1 to 64-bit bitfield | , |
Rules of thumb:
- For monetary values, never use
FLOAT/DOUBLE, they round. UseDECIMAL(M,D)with explicit precision (typicallyDECIMAL(19,4)for currency). The full breakdown of the DECIMAL-vs-integer-cents trade-off for storing money covers when each is the right call. - Unsigned only buys you a doubled positive range. Use it for counters and IDs that can never be negative; the signed/unsigned mismatch is a common source of join bugs.
INT(11)does not mean "11 digits". The number in parentheses is a display width hint that most clients ignore in MySQL 8.0+. Just writeINT.
Those ranges are hard limits, not suggestions. With strict mode on (the default since MySQL 8.0), writing one past the top of a type is an error, not a silent wrap:

String data types
| Type | Storage | Notes |
|---|---|---|
CHAR(M) | M × charset bytes, padded | Fixed length, right-padded with spaces. 0-255 chars. |
VARCHAR(M) | actual length + 1-2 bytes for length prefix | Variable length, M from 0 to 65,535 chars (subject to row size limit). |
TINYTEXT | length + 1 byte | Up to 255 bytes. Stored off-page. |
TEXT | length + 2 bytes | Up to 65,535 bytes (64KB). Stored off-page. |
MEDIUMTEXT | length + 3 bytes | Up to 16,777,215 bytes (16MB). |
LONGTEXT | length + 4 bytes | Up to 4,294,967,295 bytes (4GB). |
ENUM(...) | 1 or 2 bytes | One of up to 65,535 predefined string values. |
SET(...) | 1, 2, 3, 4, or 8 bytes | A subset of up to 64 predefined string values, stored as a bitmask. |
Multibyte charsets matter for sizing. With utf8mb4 (the default since MySQL 8.0 and required for full Unicode including emoji), each character can take up to 4 bytes. A VARCHAR(255) column in utf8mb4 reserves up to 1,020 bytes per row plus the 1-2 byte length prefix. The same column in latin1 reserves up to 255 bytes. Choose your charset before sizing string columns.
Two of the most common short-string columns have their own length-and-collation walkthroughs: storing an email address in MySQL (why VARCHAR(255) and which collation to index it under), and storing a phone number in MySQL (the E.164 VARCHAR pattern and why you never store a phone number as an integer).
TEXT columns can't have a default value (other than NULL). If your app needs an empty-string default, use VARCHAR instead. For the choice between a TEXT family type and a BLOB for bulkier content, see storing large text vs BLOB in MySQL.
Date and time data types
| Type | Storage | Range | Notes |
|---|---|---|---|
YEAR | 1 byte | 1901 to 2155 | Stored as YYYY |
DATE | 3 bytes | 1000-01-01 to 9999-12-31 | Stored as YYYY-MM-DD |
TIME | 3 bytes + fractional | -838:59:59 to 838:59:59 | Stored as HH:MM:SS[.fraction] |
DATETIME | 5 bytes + fractional | 1000-01-01 00:00:00 to 9999-12-31 23:59:59 | Calendar local time, no timezone |
TIMESTAMP | 4 bytes + fractional | 1970-01-01 00:00:01 UTC to 2038-01-19 03:14:07 UTC | Converted to UTC on store, back on read |
The "+ fractional" overhead is 0-3 bytes depending on the precision: DATETIME(0) adds 0 bytes, DATETIME(3) (millisecond) adds 2 bytes, DATETIME(6) (microsecond) adds 3 bytes.
For the difference between DATETIME and TIMESTAMP, see the DATETIME vs TIMESTAMP decision section below: short version: TIMESTAMP is half the storage and timezone-aware but capped at 2038; DATETIME is wider, dumber, and safe past 2038.
For formatting date/time columns at query time, the full set of DATE_FORMAT() specifiers is in How to Format DATETIME in MySQL.
JSON data type
| Type | Storage | Notes |
|---|---|---|
JSON | ≈ LONGBLOB + binary-encoded structure overhead | Validated on insert. Supports JSON_EXTRACT, ->, ->>, generated columns. |
MySQL stores JSON in a parsed binary format (not as text), so reads are fast but inserts are slightly slower than equivalent LONGTEXT. Use JSON when you genuinely need ad-hoc schema, indexing via generated columns, or path-based queries. For everything else, normalise. The full case for and against the native type, indexing, generated columns, when to reach for it over a normalised table, is in storing JSON in MySQL.
JSON columns count toward the row size limit indirectly, the bulk of the JSON sits off-page like TEXT/BLOB, but each row has a pointer + small metadata in the primary record.
VECTOR data type (MySQL 9.0+)
| Type | Storage | Notes |
|---|---|---|
VECTOR(N) | 4 × N bytes + small header | Stores a fixed-dimension vector of single-precision floats. N is the dimension count (e.g., 1024 for Voyage-3, 1536 for OpenAI text-embedding-3-small). |
Added in MySQL 9.0 as the native primitive for semantic search and RAG workloads. You load and read vectors with STRING_TO_VECTOR() (alias TO_VECTOR()) to turn a [1.05, -17.8, 32]-style string into the binary column value, VECTOR_TO_STRING() (alias FROM_VECTOR()) to read it back, and VECTOR_DIM() to get the dimension count.
The catch for similarity search: the DISTANCE() function (with COSINE, DOT, or EUCLIDEAN) is only available in HeatWave MySQL on OCI, it is not part of MySQL Community or Commercial. On a self-managed community server you can store and retrieve vectors natively, but the distance computation has to happen in your application (or via HeatWave). Don't plan a community-MySQL RAG pipeline around an in-database DISTANCE() that isn't there.
A 1024-dimension vector occupies ~4KB per row. A 3072-dimension vector occupies ~12KB. Plan storage accordingly, a 1M-row table with VECTOR(1024) adds ~4GB just for the embedding column.
For the full integration walkthrough (embedding model choice, backfill scripts, similarity query syntax), see How to Add Semantic Search to a MySQL App.
Spatial data types
| Type | Storage | Notes |
|---|---|---|
GEOMETRY | 25+ bytes | Generic container for any spatial type below. |
POINT | 25 bytes | A single (x, y) coordinate pair. |
LINESTRING | 33+ bytes | A sequence of points forming a line. |
POLYGON | 65+ bytes | A closed ring of points forming an area. |
MULTIPOINT / MULTILINESTRING / MULTIPOLYGON / GEOMETRYCOLLECTION | varies | Collections of the above. |
The "25+" overhead is the SRID (4 bytes) + WKB header + the actual coordinate data. For GIS workloads, add a SPATIAL INDEX for fast bounding-box and proximity queries.
Binary data types
| Type | Storage | Notes |
|---|---|---|
BINARY(M) | M bytes, zero-padded | Fixed-length raw bytes. |
VARBINARY(M) | actual length + 1-2 bytes | Variable-length raw bytes, up to 65,535. |
TINYBLOB | length + 1 byte | Up to 255 bytes. |
BLOB | length + 2 bytes | Up to 65,535 bytes (64KB). |
MEDIUMBLOB | length + 3 bytes | Up to 16,777,215 bytes (16MB). |
LONGBLOB | length + 4 bytes | Up to 4,294,967,295 bytes (4GB). |
For hashes, use BINARY(N) at the digest's exact byte width, it indexes cleanly and halves the size of the equivalent hex CHAR column. The two common cases each have a full walkthrough: storing a SHA-256 hash in BINARY(32), and storing an MD5 hash in BINARY(16), both covering the UNHEX()/HEX() round-trip and the indexed generated-column pattern for hashing a long column.
Password hashes are the exception, don't use BINARY(N). A bcrypt or Argon2 hash is a self-describing ASCII string (algorithm, cost, salt, and digest all in one), so it goes in a VARCHAR, not a fixed binary column. The right column width is covered in the password column type for MySQL, with per-algorithm detail in storing a bcrypt hash and storing an Argon2 hash.
For file storage, don't. Store files in object storage (S3, R2) and keep only the key/URL in MySQL. BLOB columns bloat backups and replication.
CHAR vs VARCHAR: when to use which
| Question | Pick CHAR | Pick VARCHAR |
|---|---|---|
| Are all values the same length? | yes (e.g., country code, US state, ISBN-13) | no |
| Will the column be in a covering index hot path? | yes (fixed-width rows pack denser in B-trees) | not necessarily |
| Is the value tiny (1-10 chars)? | yes (the 1-2 byte VARCHAR length prefix is overhead) | , |
| Is the value variable and over 30 chars? | , | yes |
| Is the column UTF8 with mixed-length content? | , | yes (CHAR wastes padding bytes) |
Common right call: CHAR(2) for ISO country codes, CHAR(36) for stringified UUIDs (or better: BINARY(16), see below), VARCHAR(255) for everything else short, TEXT for "user-typed paragraphs".
DATETIME vs TIMESTAMP: when to use which
| Concern | DATETIME | TIMESTAMP |
|---|---|---|
| Storage | 5 bytes | 4 bytes |
| Range | 1000-01-01 to 9999-12-31 | 1970-01-01 to 2038-01-19 |
| Timezone behavior | Stored as-is, no conversion | Converted to UTC on store, back to session timezone on read |
| Auto-update on row change | manual | ON UPDATE CURRENT_TIMESTAMP available |
| 2038 problem | unaffected | affected (Y2038) |
Use TIMESTAMP for created_at / updated_at audit columns where you want auto-updating and timezone awareness. Use DATETIME for user-facing dates that have semantic meaning ("event start time", "subscription renewal"): they shouldn't shift if the server timezone changes. For the end-to-end pattern (which type to pick, and how to handle time zones at insert and read time), see storing dates, times, and time zones in MySQL.
The 2038 cap is the gotcha that bites in production. TIMESTAMP accepts 2038-01-19 03:14:07 but rejects one second later; DATETIME happily stores dates into the year 9999:

For the row size calculation in How to Change a MySQL Column Type, TIMESTAMP to DATETIME is a +1 byte schema change.
INT vs BIGINT: picking the right integer size
Cost of using BIGINT instead of INT for a primary key on a 1B-row table: 4 bytes × 1 billion = ~4GB extra in the primary index, plus every secondary index that includes the PK. That's real money on a hot table.
When INT is enough: anything where the maximum count won't exceed 2.1 billion. Most user tables, order tables, transaction tables.
When BIGINT is required: distributed ID generators (Snowflake, ULID-as-int), high-volume event/log streams (Kafka offsets, click events), anything tracking nanosecond timestamps as integers.
When TINYINT(1) or BIT(1): boolean flags. TINYINT(1) is the more portable convention; MySQL's BOOLEAN is an alias for TINYINT(1). The two options are weighed side by side in storing a boolean in MySQL: TINYINT(1) vs BIT.
UUID storage: BINARY(16) vs CHAR(36)
| Storage option | Bytes per row | Index size for 1B rows | Notes |
|---|---|---|---|
CHAR(36) (stringified) | 36 bytes (latin1) or 144 bytes (utf8mb4) | 36-144 GB | Human-readable, easy debugging |
BINARY(16) (raw bytes) | 16 bytes | 16 GB | 2.25-9× smaller, no readable in queries |
For high-cardinality tables, store UUIDs as BINARY(16) and use UUID_TO_BIN() / BIN_TO_UUID() (MySQL 8.0+) to convert at the SQL layer. The swap_flag argument to UUID_TO_BIN(uuid, 1) rearranges UUID v1 bytes so the high-entropy bits sort time-locally, which dramatically improves B-tree insert performance. The full pattern, including the v1-vs-v4 sort-order gotcha and the generated-column trick for keeping a readable copy, is in storing a UUID in MySQL as BINARY(16).
CREATE TABLE orders (
id BINARY(16) PRIMARY KEY,
-- ...
);
INSERT INTO orders (id, ...) VALUES (UUID_TO_BIN(UUID(), 1), ...);
SELECT BIN_TO_UUID(id, 1) AS uuid FROM orders;For more on schema-design trade-offs in MySQL, see the MySQL Cheat Sheet.
Worked example: picking the right type for every column
Rules in isolation are easy to forget; here is a realistic users table where every column is the smallest correct type, with the reasoning inline:
CREATE TABLE users (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
public_id BINARY(16) NOT NULL UNIQUE, -- a UUID as 16 raw bytes, not a 36-byte string
email VARCHAR(255) NOT NULL UNIQUE, -- emails vary in length; index needs the real value
display_name VARCHAR(80) NOT NULL, -- short, but not a fixed width -> VARCHAR
country_code CHAR(2) NOT NULL, -- always exactly 2 chars (ISO 3166-1) -> CHAR is denser
is_active TINYINT(1) NOT NULL DEFAULT 1, -- boolean flag, 1 byte
plan ENUM('free','pro','team') NOT NULL DEFAULT 'free', -- fixed vocabulary, 1 byte
balance_cents BIGINT NOT NULL DEFAULT 0, -- money as integer cents, never FLOAT/DOUBLE
settings JSON DEFAULT NULL, -- sparse per-user prefs, queried by path
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
trial_ends_at DATETIME NULL -- user-facing date, must survive past 2038
) ENGINE=InnoDB;Why each pick:
idisBIGINT UNSIGNED, notINT. A surrogate primary key on a table you expect to grow should not cap at 4.2 billion. The 4 extra bytes per row are worth never having to migrate a hot PK.public_idisBINARY(16), notCHAR(36). The id you expose in URLs is a UUID, but stored raw it is 16 bytes and indexes cleanly. See UUID storage.country_codeisCHAR(2),display_nameisVARCHAR(80). Fixed-width value goes inCHAR; genuinely variable value goes inVARCHAR. Mixing them up either wastes padding bytes or pays a needless length prefix.balance_centsisBIGINT, neverFLOAT/DOUBLE. Money in a binary float rounds and drifts; store integer cents (orDECIMAL(19,4)if you need fractional units).created_at/updated_atareTIMESTAMP;trial_ends_atisDATETIME. Audit columns want UTC normalization andON UPDATE; a user-facing future date must not hit the 2038 wall.planisENUM,settingsisJSON. A closed set of labels costs 1 byte as anENUM; open-ended sparse data goes inJSONand is queried by path.ENUMis not always the right call for a fixed vocabulary, though, ENUM vs a lookup table in MySQL walks through when the labels change often enough that a reference table wins.
NDB storage engine considerations
The NDB Cluster storage engine has its own alignment and overhead rules that don't apply to InnoDB:
- All column data is aligned to 4 bytes.
- Columns allowing
NULLrequire an additional 4 bytes per row for the first 32 nullable columns. BIT(M)values are stored in 4-byte chunks (1-32 bits per chunk).- The maximum row size in NDB is 14,000 bytes (lower than InnoDB's 65,535).
NDB is rarely the right choice for typical web workloads. If you're not running NDB, ignore this section.
Notes on storage overhead
- Multibyte characters.
utf8mb4columns reserve up to 4 bytes per character. Size string columns based on character count, but plan disk based on character × 4 bytes. - InnoDB row overhead. Each row carries a 6-byte transaction ID, a 7-byte rollback pointer, and per-column header bytes. The headline byte counts above are payload-only.
- Compression. InnoDB tables can use
ROW_FORMAT=COMPRESSEDto halve disk usage (at CPU cost). Not commonly used; faster storage has made it less attractive. - Maximum row size. 65,535 bytes per row, excluding BLOB/TEXT columns which are stored off-page (just an off-page pointer counts toward the row size).
- Off-page storage. When a row would exceed the page size (default 16KB), InnoDB moves large variable-length columns off-page. This adds an indirect read for those columns at query time.
What to do next
- For day-to-day schema changes (adding columns, changing types), the MySQL ADD COLUMN and MySQL CHANGE COLUMN TYPE walkthroughs cover the safe-online patterns.
- For datetime formatting at query time, MySQL DATE_FORMAT() reference.
- For the broader SQL syntax reference, the MySQL Cheat Sheet covers commands, types, and functions on one page.
- For semantic search using the new
VECTORtype, How to Add Semantic Search to a MySQL App.
FAQ
Sources
Authoritative references this article was fact-checked against.
- MySQL 9.0 release notesdev.mysql.com
- MySQL 8.4, data type storage requirementsdev.mysql.com
- MySQL 9.1, Vector Functions (STRING_TO_VECTOR, DISTANCE availability)dev.mysql.com





