Store a UUID in MongoDB: BSON Binary Subtype 4 (2026)

A UUID is a 128-bit value, which is exactly 16 bytes raw or 36 characters as the canonical hyphenated string. MongoDB has no dedicated UUID field type, so the question is how to represent those 16 bytes inside a BSON document: as a 36-character string, or as a BSON Binary value of subtype 4 (the standard UUID subtype). The binary form is the one to reach for. It stores in 16 bytes instead of 36+, indexes and sorts on the raw bytes, and compares correctly across every driver. Below is the comparison table, working mongosh and driver examples, the historical subtype 3 mess that still bites cross-language systems, and when to prefer MongoDB's native ObjectId for _id instead.

Short answer: store a UUID as a BSON Binary of subtype 4, never as a 36-character string. In mongosh use UUID("..."), which produces exactly that. In a driver, use the language's UUID type bound to the standard BSON UUID representation. Keep MongoDB's default ObjectId for _id unless you specifically need client-generated, externally-meaningful identifiers, in which case store those UUIDs as subtype 4 too. This is the document-database version of the same trade-off as storing a UUID in MySQL (BINARY(16) versus CHAR(36)) and the same call in PostgreSQL (the native uuid type versus text).

What is a UUID and how is it stored in MongoDB?

A UUID (Universally Unique Identifier) is a fixed-size 128-bit value designed to be unique without a central coordinator. Its canonical text form is 36 characters: 32 hexadecimal digits in five hyphen-separated groups, like 3b241101-e2bb-4255-8caf-4136c566a962. Strip the hyphens and you have 16 raw bytes.

BSON, the binary format MongoDB stores documents in, has a Binary type with a one-byte subtype tag that says what the bytes mean. The subtype that means "this is a UUID" is subtype 4. So the correct way to store a UUID is a Binary value carrying the 16 raw bytes tagged subtype 4, not the 36-character string in a String field. The string costs about twice the space: as a BSON value the 36-character string serializes to 41 bytes (a 4-byte length prefix, the 36 bytes, and a null terminator), against 21 bytes for the binary (a 4-byte length prefix, a 1-byte subtype tag, and the 16 raw bytes). String comparison also runs byte-for-byte over the hex text rather than over the value itself.

In mongosh, the helper is the UUID() function. Called with no argument it generates a random RFC 4122 version 4 UUID; called with a string it parses that string. Either way the result is a BSON Binary of subtype 4:

javascript

// generate a fresh v4 UUID (mongosh)
UUID()
// UUID("dee11d4e-63c6-4d90-983c-5c9f1e79e96c")

// parse an existing string into the same subtype-4 Binary
UUID("3b241101-e2bb-4255-8caf-4136c566a962")

A live mongosh session: mongosh, UUID as a binary subtype, with the real inserted and returned document. — A live mongosh session in MongoDB 7: the real document as stored, not illustrative.

You can equally generate the UUID in your application and hand MongoDB the finished value. In drivers you use the language's native UUID type bound to the BSON UUID representation: the UUID class in the Node.js bson package, uuid.UUID with the standard representation in PyMongo, java.util.UUID in the Java driver. The storage on disk is identical: a 16-byte subtype-4 binary either way.

A mongosh session inserting a document whose _id is a UUID, then reading it back: findOne shows the _id as a UUID value, its sub_type is 4 (the standard UUID subtype), and its length is 16 bytes rather than 36 characters. — Stored through UUID(), the _id comes back as a Binary of sub_type 4 and length 16 bytes, not a 36-character string. Real output from MongoDB 7 (mongosh).

Jump to:

Storage comparison table
Insert and query a UUID field
The legacy subtype 3 byte-order mess
ObjectId vs UUID for _id
A UUID as _id, done right
UUIDv7 and insert locality
FAQ

Storage comparison table

Representation	Size	Indexes / sorts on	Time-ordered?	When to use
String UUID (36 chars)	36 bytes (41 as a BSON value)	the hex text, byte by byte	No	Almost never; only quick ad-hoc data
`Binary` subtype 4 (UUID)	16 bytes	the raw 16 bytes	Only if you store a v7	Any UUID you keep; client-generated ids
`ObjectId`	12 bytes	timestamp-leading bytes	Roughly (4-byte timestamp prefix)	The default `_id`; server-side ids

The string form is the outlier: roughly twice the bytes of the binary, and that cost lands in every index on the field plus the working set MongoDB keeps in RAM. A Binary subtype 4 is the compact, portable choice for a UUID you actually generate yourself. ObjectId is smaller still at 12 bytes and carries a built-in timestamp, which is exactly why it is the right default for _id unless you have a reason to override it (covered below).

Insert and query a UUID field

Store the UUID as a field alongside MongoDB's own ObjectId _id. In mongosh:

javascript

// insert a document with a UUID field (Binary subtype 4)
db.sessions.insertOne({
  _id: new ObjectId(),
  token: UUID("3b241101-e2bb-4255-8caf-4136c566a962"),
  userId: 42,
  createdAt: new Date()
})

// query by the UUID: pass a UUID(), not the raw string
db.sessions.findOne({ token: UUID("3b241101-e2bb-4255-8caf-4136c566a962") })

// index it for fast equality lookups
db.sessions.createIndex({ token: 1 })

The detail that trips people up: a findOne({ token: "3b241101-..." }) with a plain string will not match a document whose token is a subtype-4 Binary. BSON equality is type-aware, so a String never equals a Binary. Always wrap the value in UUID(...) (or the driver's UUID type) on the way in and on the way out. Same idea as wrapping a MySQL lookup parameter in UUID_TO_BIN() so you compare bytes to bytes.

In a Node.js driver the shape is the same, using the UUID class so the value serializes to subtype 4:

javascript

import { MongoClient, UUID } from "mongodb";

const sessions = client.db("app").collection("sessions");

await sessions.insertOne({
  token: new UUID("3b241101-e2bb-4255-8caf-4136c566a962"),
  userId: 42
});

const doc = await sessions.findOne({
  token: new UUID("3b241101-e2bb-4255-8caf-4136c566a962")
});

The legacy subtype 3 byte-order mess

If you are starting fresh, use subtype 4 and skip this section. If you are touching an older MongoDB system, especially one written against early .NET, Java, or Python drivers, you need to know it exists.

Originally MongoDB represented UUIDs as BSON Binary of subtype 3. The problem: subtype 3 never standardized the byte order of the 16 bytes, so different language drivers serialized the same UUID into different byte layouts. Per the MongoDB driver specification, the C# legacy representation reversed bytes within three sub-groups, the Java legacy representation reversed the two 8-byte halves, and the Python legacy representation kept the native order. A UUID written by a Java app and read by a C# app came back scrambled. That is the entire reason subtype 4 was introduced: it fixes the byte order so every driver using the standard representation reads and writes the same 16 bytes with no reordering.

The practical guidance:

New data: use subtype 4 (standard representation) everywhere. In mongosh that is just UUID(). In drivers, set the UUID representation explicitly rather than relying on a default that may still be a legacy mode for backward compatibility.
In PyMongo, set it on the client or codec options: the STANDARD representation encodes native uuid.UUID objects to subtype 4, and all standard-representation drivers agree on those bytes. The PYTHON_LEGACY / CSHARP_LEGACY / JAVA_LEGACY modes exist only to read old subtype-3 data written by those drivers.
Migrating old data: read each value with the legacy representation that wrote it, then re-insert it under the standard representation as subtype 4. Do not guess the byte order; match it to the driver that originally wrote the documents.

The one-line version: subtype 3 is a compatibility trap born of an unspecified byte order; subtype 4 is the fixed, portable standard. Store new UUIDs as subtype 4.

ObjectId vs UUID for _id

MongoDB's default _id is an ObjectId: a 12-byte value made of a 4-byte timestamp (seconds since the Unix epoch), a 5-byte per-process random value, and a 3-byte incrementing counter. If you insert a document without an _id, MongoDB generates one for you.

That structure has two consequences worth understanding. Because the timestamp leads the 12 bytes (stored big-endian, most significant byte first), ObjectIds are roughly time-ordered: newer ids generally sort after older ones, so inserts tend to append to the right-hand side of the _id index instead of scattering across it. That gives good insert locality and a tight, cache-friendly index. The MongoDB manual is careful to note they are not perfectly monotonic, since they carry only one-second resolution and are generated by clients whose clocks may differ, but for index-build behavior the rough ordering is what matters.

So when do you reach past ObjectId for a UUID _id? When you genuinely need one of these:

Client-generated ids. You need the identifier before the insert round-trip (to return it to a caller, to reference it in a related write, to build an idempotency key). A UUID is generated locally; an ObjectId can be too, but a UUID is the cross-language standard for it.
Globally unique, externally meaningful ids. The same id has to be unique and valid across other systems (a relational database, an event stream, another service) that already speak UUID. Reusing one identifier everywhere beats translating between an ObjectId and something else at every boundary.
Avoiding the embedded timestamp. An ObjectId leaks its creation time and a process fingerprint to anyone who can read it. A v4 UUID exposes neither, which can matter for ids that appear in URLs or are handed to clients.

For everything else, keep ObjectId. It is smaller (12 bytes vs 16), it is time-ordered for free, and it is what every MongoDB tool and aggregation expects. Do not swap a UUID in for _id reflexively; do it when one of the reasons above actually applies.

A UUID as _id, done right

If you do use a UUID for _id, store it as a Binary subtype 4, exactly like any other UUID field. Generate it client-side and put it in _id explicitly so MongoDB does not create an ObjectId instead:

javascript

// mongosh: UUID primary key, generated client-side
const id = UUID()
db.accounts.insertOne({
  _id: id,
  email: "ada@example.com",
  createdAt: new Date()
})

// look one up by its UUID _id
db.accounts.findOne({ _id: UUID("...") })

javascript

// Node driver: UUID _id
import { UUID } from "mongodb";

const id = new UUID();                       // client-generated, returnable immediately
await db.collection("accounts").insertOne({
  _id: id,
  email: "ada@example.com"
});

The _id index is unique and built automatically, so a UUID _id is indexed the moment you insert. The cost, exactly as in the relational world, is that a random UUID (v4) scatters inserts across the _id index. You lose the neat right-hand append that ObjectId's leading timestamp gives you, which on a high-write collection means more index page churn and a colder cache. Which leads to the fix.

UUIDv7 and insert locality

A version 4 UUID is entirely random, so it has no natural ordering, so as a primary key it inserts into a random spot in the index every time. A version 7 UUID is built differently: its leading bits are a Unix-millisecond timestamp, so its raw 16-byte order is chronological. Stored as a Binary subtype 4, a UUIDv7 sorts in roughly creation order, which means inserts append to the end of the index instead of scattering. You regain most of the insert locality that ObjectId gives you, while keeping the cross-system portability and client-side generation that made you choose a UUID in the first place.

This is the same story as the relational side, where a time-ordered UUIDv7 fixes the random-insert problem in MySQL and Postgres that a v4 primary key creates. If you have decided a UUID _id is the right call for a write-heavy MongoDB collection, generate a v7 in your application and store the 16 bytes as subtype 4. If you do not need a client-generated id at all, the honest default is still MongoDB's ObjectId, which has been time-ordered by design since the beginning.

What to do next

For the relational equivalent and the BINARY(16) versus CHAR(36) trade-off, see How to Store a UUID in MySQL.
For the same call in Postgres, where there is a real native uuid type, see How to Store a UUID in PostgreSQL.
To get MongoDB running locally for these examples, see How to Run MongoDB in Docker.
A UUID is usually the key you store when you reference one document from another, so the choice ties into embedding vs referencing in a MongoDB schema.

FAQ

As a BSON Binary value of subtype 4, the standard UUID subtype, not as a 36-character string. The binary form is 16 raw bytes against the string's 41 bytes as a BSON value, indexes and compares on the raw value, and reads back identically across every driver. In mongosh use UUID("..."); in a driver use the language UUID type bound to the standard BSON UUID representation.

No. BSON has no dedicated UUID type the way it has ObjectId or Date. A UUID is stored inside the general-purpose Binary type, with the one-byte subtype tag set to 4 to mark the 16 bytes as a UUID. That is what mongosh's UUID() helper and each driver's UUID class produce under the hood. So "store a UUID in MongoDB" really means "store a 16-byte Binary of subtype 4", not reach for a UUID column type that does not exist. This differs from PostgreSQL, which does have a real native uuid type.

Call UUID() with no argument. It returns a fresh random RFC 4122 version 4 UUID as a Binary of subtype 4, ready to insert. To insert a generated UUID as the primary key, assign it first so you can reuse the value: const id = UUID(), then db.coll.insertOne({ _id: id }). To turn an existing 36-character string into the same subtype-4 Binary, pass it: UUID("3b241101-e2bb-4255-8caf-4136c566a962"). In application code, generate the UUID with the language's own UUID library and hand the driver's UUID type to MongoDB; the bytes on disk are identical either way.

Subtype 4 is the current, standardized UUID binary subtype: every driver using the standard representation encodes and decodes the same 16 bytes with no reordering. Subtype 3 is the legacy "UUID (old)" subtype, which never specified a byte order, so different drivers (C# legacy, Java legacy, Python legacy) stored the same UUID in different byte layouts and could not read each other's values. Subtype 4 exists to fix exactly that. Store new UUIDs as subtype 4.

Default to ObjectId. It is 12 bytes, MongoDB generates it for you, and its leading 4-byte timestamp makes it roughly time-ordered, so inserts append to the index instead of scattering. Reach for a UUID _id only when you need a client-generated id before the insert, a globally unique id that other systems already speak, or to avoid leaking ObjectId's embedded creation time. If you do use a UUID for _id, store it as Binary subtype 4.

Because BSON equality is type-aware: a String never equals a Binary. If the field holds a subtype-4 Binary, a query like { token: "3b241101-..." } with a plain string matches nothing. Wrap the value in UUID("...") in mongosh, or the driver's UUID type in code, so you compare a binary to a binary.

A version 4 (random) UUID as _id scatters inserts across the unique _id index, causing more page churn and a colder cache than ObjectId's time-ordered values. The fix is a version 7 UUID: its leading bits are a millisecond timestamp, so stored as Binary subtype 4 it sorts chronologically and inserts append to the index. If you do not need a client-generated id at all, ObjectId is already time-ordered and is the simpler default.

Set it explicitly to the standard representation rather than relying on a default that may still be a legacy mode for backward compatibility. In PyMongo, the STANDARD uuid representation encodes native uuid.UUID objects to subtype 4 and reads other standard-representation drivers' values correctly; the PYTHON_LEGACY, CSHARP_LEGACY, and JAVA_LEGACY modes exist only to read old subtype-3 data those drivers wrote. The Node and Java drivers expose the same UUID type and standard encoding. When migrating old data, read it under the legacy representation that wrote it, then re-insert as standard subtype 4.

How to Store a UUID in MongoDB (BSON Binary Subtype 4)

What is a UUID and how is it stored in MongoDB?

Storage comparison table

Insert and query a UUID field

The legacy subtype 3 byte-order mess

ObjectId vs UUID for _id

A UUID as _id, done right

UUIDv7 and insert locality

What to do next

FAQ

See also

Sources

Ishan Karunaratne

Related posts

How to Store a UUID in MySQL: BINARY(16) vs CHAR(36)

How to Store a UUID in PostgreSQL (the Native uuid Type)

How to Store an IP Address in MySQL: INT UNSIGNED vs VARBINARY(16)

How should I store a UUID in MongoDB?

Does MongoDB have a native UUID field type?

How do I generate a UUID in mongosh?

What is BSON binary subtype 4 versus subtype 3?

Should I use ObjectId or a UUID for the _id field?

Why won't my string match a stored UUID in a query?

Does a random UUID _id hurt MongoDB insert performance?

How do I set the UUID representation in a driver?

Sources

Ishan Karunaratne