TechEarl

How to Store a UUID in MongoDB (BSON Binary Subtype 4)

How to store a UUID in MongoDB the right way: as a 16-byte BSON Binary of subtype 4, not a 36-character string. Covers mongosh UUID(), driver representations, the legacy subtype 3 byte-order mess, and ObjectId vs UUID for _id.

Ishan Karunaratne⏱️ 14 min readUpdated
Share thisCopied
How to store a UUID in MongoDB as a 16-byte BSON Binary of subtype 4 rather than a 36-character string, with mongosh UUID(), driver UUID representations, the legacy subtype 3 byte-order problem, and ObjectId versus UUID for the _id field.

A UUID is a 128-bit value, which is exactly 16 bytes raw or 36 characters as the canonical hyphenated string. MongoDB has no dedicated UUID field type, so the question is how to represent those 16 bytes inside a BSON document: as a 36-character string, or as a BSON Binary value of subtype 4 (the standard UUID subtype). The binary form is the one to reach for. It stores in 16 bytes instead of 36+, indexes and sorts on the raw bytes, and compares correctly across every driver. Below is the comparison table, working mongosh and driver examples, the historical subtype 3 mess that still bites cross-language systems, and when to prefer MongoDB's native ObjectId for _id instead.

Short answer: store a UUID as a BSON Binary of subtype 4, never as a 36-character string. In mongosh use UUID("..."), which produces exactly that. In a driver, use the language's UUID type bound to the standard BSON UUID representation. Keep MongoDB's default ObjectId for _id unless you specifically need client-generated, externally-meaningful identifiers, in which case store those UUIDs as subtype 4 too. This is the document-database version of the same trade-off as storing a UUID in MySQL (BINARY(16) versus CHAR(36)) and the same call in PostgreSQL (the native uuid type versus text).

What is a UUID and how is it stored in MongoDB?

A UUID (Universally Unique Identifier) is a fixed-size 128-bit value designed to be unique without a central coordinator. Its canonical text form is 36 characters: 32 hexadecimal digits in five hyphen-separated groups, like 3b241101-e2bb-4255-8caf-4136c566a962. Strip the hyphens and you have 16 raw bytes.

BSON, the binary format MongoDB stores documents in, has a Binary type with a one-byte subtype tag that says what the bytes mean. The subtype that means "this is a UUID" is subtype 4. So the correct way to store a UUID is a Binary value carrying the 16 raw bytes tagged subtype 4, not the 36-character string in a String field. The string costs about twice the space: as a BSON value the 36-character string serializes to 41 bytes (a 4-byte length prefix, the 36 bytes, and a null terminator), against 21 bytes for the binary (a 4-byte length prefix, a 1-byte subtype tag, and the 16 raw bytes). String comparison also runs byte-for-byte over the hex text rather than over the value itself.

In mongosh, the helper is the UUID() function. Called with no argument it generates a random RFC 4122 version 4 UUID; called with a string it parses that string. Either way the result is a BSON Binary of subtype 4:

javascript
// generate a fresh v4 UUID (mongosh)
UUID()
// UUID("dee11d4e-63c6-4d90-983c-5c9f1e79e96c")

// parse an existing string into the same subtype-4 Binary
UUID("3b241101-e2bb-4255-8caf-4136c566a962")

You can equally generate the UUID in your application and hand MongoDB the finished value. In drivers you use the language's native UUID type bound to the BSON UUID representation: the UUID class in the Node.js bson package, uuid.UUID with the standard representation in PyMongo, java.util.UUID in the Java driver. The storage on disk is identical: a 16-byte subtype-4 binary either way.

Jump to:

Storage comparison table

RepresentationSizeIndexes / sorts onTime-ordered?When to use
String UUID (36 chars)36 bytes (41 as a BSON value)the hex text, byte by byteNoAlmost never; only quick ad-hoc data
Binary subtype 4 (UUID)16 bytesthe raw 16 bytesOnly if you store a v7Any UUID you keep; client-generated ids
ObjectId12 bytestimestamp-leading bytesRoughly (4-byte timestamp prefix)The default _id; server-side ids

The string form is the outlier: roughly twice the bytes of the binary, and that cost lands in every index on the field plus the working set MongoDB keeps in RAM. A Binary subtype 4 is the compact, portable choice for a UUID you actually generate yourself. ObjectId is smaller still at 12 bytes and carries a built-in timestamp, which is exactly why it is the right default for _id unless you have a reason to override it (covered below).

Insert and query a UUID field

Store the UUID as a field alongside MongoDB's own ObjectId _id. In mongosh:

javascript
// insert a document with a UUID field (Binary subtype 4)
db.sessions.insertOne({
  _id: new ObjectId(),
  token: UUID("3b241101-e2bb-4255-8caf-4136c566a962"),
  userId: 42,
  createdAt: new Date()
})

// query by the UUID: pass a UUID(), not the raw string
db.sessions.findOne({ token: UUID("3b241101-e2bb-4255-8caf-4136c566a962") })

// index it for fast equality lookups
db.sessions.createIndex({ token: 1 })

The detail that trips people up: a findOne({ token: "3b241101-..." }) with a plain string will not match a document whose token is a subtype-4 Binary. BSON equality is type-aware, so a String never equals a Binary. Always wrap the value in UUID(...) (or the driver's UUID type) on the way in and on the way out. Same idea as wrapping a MySQL lookup parameter in UUID_TO_BIN() so you compare bytes to bytes.

In a Node.js driver the shape is the same, using the UUID class so the value serializes to subtype 4:

javascript
import { MongoClient, UUID } from "mongodb";

const sessions = client.db("app").collection("sessions");

await sessions.insertOne({
  token: new UUID("3b241101-e2bb-4255-8caf-4136c566a962"),
  userId: 42
});

const doc = await sessions.findOne({
  token: new UUID("3b241101-e2bb-4255-8caf-4136c566a962")
});

The legacy subtype 3 byte-order mess

If you are starting fresh, use subtype 4 and skip this section. If you are touching an older MongoDB system, especially one written against early .NET, Java, or Python drivers, you need to know it exists.

Originally MongoDB represented UUIDs as BSON Binary of subtype 3. The problem: subtype 3 never standardized the byte order of the 16 bytes, so different language drivers serialized the same UUID into different byte layouts. Per the MongoDB driver specification, the C# legacy representation reversed bytes within three sub-groups, the Java legacy representation reversed the two 8-byte halves, and the Python legacy representation kept the native order. A UUID written by a Java app and read by a C# app came back scrambled. That is the entire reason subtype 4 was introduced: it fixes the byte order so every driver using the standard representation reads and writes the same 16 bytes with no reordering.

The practical guidance:

  • New data: use subtype 4 (standard representation) everywhere. In mongosh that is just UUID(). In drivers, set the UUID representation explicitly rather than relying on a default that may still be a legacy mode for backward compatibility.
  • In PyMongo, set it on the client or codec options: the STANDARD representation encodes native uuid.UUID objects to subtype 4, and all standard-representation drivers agree on those bytes. The PYTHON_LEGACY / CSHARP_LEGACY / JAVA_LEGACY modes exist only to read old subtype-3 data written by those drivers.
  • Migrating old data: read each value with the legacy representation that wrote it, then re-insert it under the standard representation as subtype 4. Do not guess the byte order; match it to the driver that originally wrote the documents.

The one-line version: subtype 3 is a compatibility trap born of an unspecified byte order; subtype 4 is the fixed, portable standard. Store new UUIDs as subtype 4.

ObjectId vs UUID for _id

MongoDB's default _id is an ObjectId: a 12-byte value made of a 4-byte timestamp (seconds since the Unix epoch), a 5-byte per-process random value, and a 3-byte incrementing counter. If you insert a document without an _id, MongoDB generates one for you.

That structure has two consequences worth understanding. Because the timestamp leads the 12 bytes (stored big-endian, most significant byte first), ObjectIds are roughly time-ordered: newer ids generally sort after older ones, so inserts tend to append to the right-hand side of the _id index instead of scattering across it. That gives good insert locality and a tight, cache-friendly index. The MongoDB manual is careful to note they are not perfectly monotonic, since they carry only one-second resolution and are generated by clients whose clocks may differ, but for index-build behavior the rough ordering is what matters.

So when do you reach past ObjectId for a UUID _id? When you genuinely need one of these:

  • Client-generated ids. You need the identifier before the insert round-trip (to return it to a caller, to reference it in a related write, to build an idempotency key). A UUID is generated locally; an ObjectId can be too, but a UUID is the cross-language standard for it.
  • Globally unique, externally meaningful ids. The same id has to be unique and valid across other systems (a relational database, an event stream, another service) that already speak UUID. Reusing one identifier everywhere beats translating between an ObjectId and something else at every boundary.
  • Avoiding the embedded timestamp. An ObjectId leaks its creation time and a process fingerprint to anyone who can read it. A v4 UUID exposes neither, which can matter for ids that appear in URLs or are handed to clients.

For everything else, keep ObjectId. It is smaller (12 bytes vs 16), it is time-ordered for free, and it is what every MongoDB tool and aggregation expects. Do not swap a UUID in for _id reflexively; do it when one of the reasons above actually applies.

A UUID as _id, done right

If you do use a UUID for _id, store it as a Binary subtype 4, exactly like any other UUID field. Generate it client-side and put it in _id explicitly so MongoDB does not create an ObjectId instead:

javascript
// mongosh: UUID primary key, generated client-side
const id = UUID()
db.accounts.insertOne({
  _id: id,
  email: "ada@example.com",
  createdAt: new Date()
})

// look one up by its UUID _id
db.accounts.findOne({ _id: UUID("...") })
javascript
// Node driver: UUID _id
import { UUID } from "mongodb";

const id = new UUID();                       // client-generated, returnable immediately
await db.collection("accounts").insertOne({
  _id: id,
  email: "ada@example.com"
});

The _id index is unique and built automatically, so a UUID _id is indexed the moment you insert. The cost, exactly as in the relational world, is that a random UUID (v4) scatters inserts across the _id index. You lose the neat right-hand append that ObjectId's leading timestamp gives you, which on a high-write collection means more index page churn and a colder cache. Which leads to the fix.

UUIDv7 and insert locality

A version 4 UUID is entirely random, so it has no natural ordering, so as a primary key it inserts into a random spot in the index every time. A version 7 UUID is built differently: its leading bits are a Unix-millisecond timestamp, so its raw 16-byte order is chronological. Stored as a Binary subtype 4, a UUIDv7 sorts in roughly creation order, which means inserts append to the end of the index instead of scattering. You regain most of the insert locality that ObjectId gives you, while keeping the cross-system portability and client-side generation that made you choose a UUID in the first place.

This is the same story as the relational side, where a time-ordered UUIDv7 fixes the random-insert problem in MySQL and Postgres that a v4 primary key creates. If you have decided a UUID _id is the right call for a write-heavy MongoDB collection, generate a v7 in your application and store the 16 bytes as subtype 4. If you do not need a client-generated id at all, the honest default is still MongoDB's ObjectId, which has been time-ordered by design since the beginning.

What to do next

FAQ

See also

Sources

Authoritative references this article was fact-checked against.

TagsMongoDBUUIDBSONDatabase StorageSchema DesignObjectIdBinary

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

How to Store a UUID in MySQL: BINARY(16) vs CHAR(36)

How to store a UUID in MySQL or MariaDB: the readable CHAR(36) string or the BINARY(16) you should usually reach for. Storage cost, index size, UUID_TO_BIN/BIN_TO_UUID, the swap_flag trick, and why random UUID primary keys wreck InnoDB inserts.