Three reliable ways to find rows in one MySQL table that do not exist in another (the anti-join pattern): LEFT JOIN ... IS NULL (the classic, works on every version), NOT EXISTS with a correlated subquery (often the fastest on modern MySQL), and NOT IN (the shortest to write, but has a famous NULL gotcha). I'll walk all three on a pending_emails vs sent_emails pair, the kind of mail-queue dedup that comes up constantly in real systems.
CREATE TABLE pending_emails (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
recipient VARCHAR(255) NOT NULL,
template VARCHAR(60) NOT NULL,
queued_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE sent_emails (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
recipient VARCHAR(255) NOT NULL,
template VARCHAR(60) NOT NULL,
sent_at DATETIME NOT NULL,
INDEX idx_recipient_template (recipient, template)
);The job we want to do: find pending emails that have not yet been sent. That is "rows in pending_emails whose (recipient, template) pair does not appear in sent_emails".
Jump to:
- How do I find rows in one MySQL table not in another?
- Method 1: LEFT JOIN IS NULL (works on every version)
- Method 2: NOT EXISTS (often the fastest)
- Method 3: NOT IN (the NULL gotcha)
- Comparison: which anti-join to pick
- Indexing the reference table
- Anti-joins on multiple columns
- FAQ
How do I find rows in one MySQL table not in another?
The three idioms that work in MySQL 8.0 and 8.4: LEFT JOIN ... WHERE right.id IS NULL (the classic anti-join, works on every version), NOT EXISTS with a correlated subquery (often the fastest on MySQL 8.0+ where the optimizer rewrites it into an anti-join plan), and NOT IN (the shortest, but returns zero rows if the subquery has any NULL values). For new code, use NOT EXISTS: SELECT p.id FROM pending_emails p WHERE NOT EXISTS (SELECT 1 FROM sent_emails s WHERE s.recipient = p.recipient AND s.template = p.template);. It reads naturally as English, handles NULLs correctly, supports multi-column conditions, and on MySQL 8.0.17 and later it usually produces the same execution plan as the LEFT JOIN form. Index the reference table on the join columns; without that index, every anti-join shape collapses to a quadratic scan.
Method 1: LEFT JOIN IS NULL (works on every version)
The classic anti-join. Pair every row on the left with a matching row on the right; keep only the pairings where the right side came back empty.
SELECT p.id, p.recipient, p.template, p.queued_at
FROM pending_emails p
LEFT JOIN sent_emails s
ON s.recipient = p.recipient
AND s.template = p.template
WHERE s.id IS NULL;Read it as two steps. The LEFT JOIN returns every row from pending_emails, attached to the matching sent_emails row if there is one, or with NULL for the sent_emails columns if not. The WHERE s.id IS NULL keeps only the rows where there was no match — exactly the anti-join.
The condition has to be on a NOT NULL column from the right table. s.id is the primary key, so it can never be NULL when a match exists; if it is NULL, the match did not exist. Picking a nullable column for the IS NULL check breaks the pattern — you would also get rows where the match existed but that specific column happened to be NULL.
Works on MySQL 5.5, 5.6, 5.7, 8.0, and 8.4.
Method 2: NOT EXISTS (often the fastest)
NOT EXISTS reads more naturally as English ("keep pending rows for which no sent row exists with the same recipient and template") and is often the fastest on modern MySQL because the optimizer can short-circuit as soon as the first match is found.
SELECT p.id, p.recipient, p.template, p.queued_at
FROM pending_emails p
WHERE NOT EXISTS (
SELECT 1
FROM sent_emails s
WHERE s.recipient = p.recipient
AND s.template = p.template
);The subquery's SELECT 1 is convention — EXISTS only checks whether any row comes back, not what it contains, so picking the cheapest possible projection (1) is the right thing to do. Some style guides write SELECT * here, which works identically; the explicit 1 makes the "I only care that this exists" intent obvious.
On MySQL 8.0.17 and newer (including 8.4), the optimizer applies an explicit anti-join transformation that rewrites LEFT JOIN ... IS NULL and NOT EXISTS into the same execution plan in most cases, so performance is identical. Confirm with EXPLAIN FORMAT=TREE — both should show an Anti-Join operator on top of a hash join or nested loop. On 5.7 and earlier the optimizer is less aggressive, and NOT EXISTS is usually the winner there too.
Works on every MySQL version.
Method 3: NOT IN (the NULL gotcha)
The shortest to write, and the one that catches people who don't know about the NULL trap.
SELECT p.id, p.recipient, p.template, p.queued_at
FROM pending_emails p
WHERE p.recipient NOT IN (
SELECT s.recipient
FROM sent_emails s
);This works correctly only if sent_emails.recipient is NOT NULL. If even one row in sent_emails has a NULL recipient, the entire NOT IN returns no rows at all — not what you want.
The reason is SQL's three-valued logic. NOT IN is internally evaluated as a chain of <> comparisons: p.recipient <> 'a@example.com' AND p.recipient <> 'b@example.com' AND p.recipient <> NULL AND .... The p.recipient <> NULL comparison is itself NULL (neither true nor false), and AND NULL is NULL, so the whole row gets filtered out as not-true.
There are two ways around this:
- Guarantee the subquery has no NULLs by adding
WHERE s.recipient IS NOT NULL:
SELECT p.id, p.recipient
FROM pending_emails p
WHERE p.recipient NOT IN (
SELECT s.recipient FROM sent_emails s
WHERE s.recipient IS NOT NULL
);- Just use Method 1 or Method 2.
LEFT JOIN ... IS NULLandNOT EXISTSboth handle NULLs in the reference table correctly without needing a special clause. For new code, prefer them overNOT IN.
The third reason to avoid NOT IN: it does not support multi-column comparisons cleanly. For two-column anti-joins ("missing on (recipient, template) together"), you have to use Method 1 or Method 2.
Comparison: which anti-join to pick
| Method | Works on | NULL safe | Multi-column | Typical speed |
|---|---|---|---|---|
| LEFT JOIN IS NULL | All versions | Yes | Yes | Good |
| NOT EXISTS | All versions | Yes | Yes | Often best on 8.0+ |
| NOT IN | All versions | No (caveat) | Single column only | Slowest, brittle |
For new code, use NOT EXISTS unless an EXPLAIN shows LEFT JOIN ... IS NULL running faster on your specific data. Avoid NOT IN for anti-joins; reserve it for explicit lists like WHERE status NOT IN ('cancelled', 'refunded').
Indexing the reference table
The performance of every anti-join depends on whether the reference table has an index on the columns the join condition uses. For the example above, sent_emails needs an index on (recipient, template):
CREATE INDEX idx_:right_recipient_template
ON sent_emails (recipient, template);Without that index, both LEFT JOIN ... IS NULL and NOT EXISTS fall back to scanning every row of sent_emails for each row in pending_emails — O(n × m). On a sent_emails table with millions of rows and a pending_emails queue of thousands, that's billions of comparisons.
With the index, each lookup is O(log m), and the total cost drops to O(n × log m) which is fast enough for almost any realistic queue size.
Confirm the index is being used with EXPLAIN. The type column for sent_emails should be ref (good) rather than ALL (bad).
Anti-joins on multiple columns
For the email queue example, we want pending rows where (recipient, template) together does not match any sent row. Both LEFT JOIN ... IS NULL and NOT EXISTS handle this naturally — the join condition is just an AND of the column comparisons:
-- LEFT JOIN with two columns
SELECT p.*
FROM pending_emails p
LEFT JOIN sent_emails s
ON s.recipient = p.recipient
AND s.template = p.template
WHERE s.id IS NULL;
-- NOT EXISTS with two columns
SELECT p.*
FROM pending_emails p
WHERE NOT EXISTS (
SELECT 1
FROM sent_emails s
WHERE s.recipient = p.recipient
AND s.template = p.template
);NOT IN cannot do this without ugly tuple-comparison syntax that is not supported on all MySQL versions. This is another reason to prefer the other two methods.
What to do next
If you need a refresher on the JOIN mechanics this article builds on — when INNER JOIN vs LEFT JOIN returns which rows, and the row-multiplication gotcha — the foundation walkthrough is How to JOIN Two MySQL Tables (INNER, LEFT, RIGHT).
If the rows missing from the right table are actually duplicates that should have collapsed (a common cause of "ghost" anti-join results), How to Find Duplicate Rows in MySQL and How to Delete Duplicate Rows in MySQL are the diagnostic and fix.
For data flowing into the comparison via CSV (the "what is in the new file that is not in the database yet" question), How to Import a CSV File Into MySQL loads the staging table, and How to Export a MySQL Table to CSV writes the delta back out. For full backups around the migration, How to Export or Back Up All MySQL Databases is the matching mysqldump walkthrough.
When an anti-join is slow because the join column types do not match (an INT joined to a BIGINT UNSIGNED, or a VARCHAR(255) joined to a CHAR(36)), the fix is in How to Change a MySQL Column Type; picking the right type up front avoids that scenario, see MySQL Field Types and Sizes. For adding the index column itself, How to Add a Column to a MySQL Table.
For scheduled anti-join audits driven from a shell script, Bash for loop and Bash while loop are the wrapper patterns.
For the wider reference covering all the JOIN, subquery, and aggregate syntax in one place, the MySQL Cheat Sheet is the single-page lookup.
External reference: official documentation on subqueries with NOT EXISTS and NOT IN behavior with NULL.





