Three reliable ways to delete duplicate rows in MySQL while keeping one row per group: a DELETE with self-join (works on every version), a DELETE with a ROW_NUMBER() CTE (MySQL 8.0 and up, the cleanest pattern), and a temp-table swap (the safest option when the table is huge or has foreign keys pointing at it). I'll walk all three with a dry-run step, a transaction wrapper, and the unique constraint to add after the cleanup so the duplicates do not come back.
The first time I had to run a deletion like this on a production table, I did it in a single statement with no transaction and no backup of the affected rows. The query was technically correct, but it removed 40,000 rows in one shot and I had no way to verify which copies it kept until much later. Since then, every dedup runs through the same checklist: identify, count, dry-run, transact, verify, add a constraint. Here are the three deletion patterns I reach for, mapped to that checklist.
The setup: identify before you delete
Before deleting anything, you need the list of duplicate row IDs that should go. How to Find Duplicate Rows in MySQL covers the three identification patterns in detail; the deletion queries below all assume a customers table with duplicate email values, the same example data used in the identification walkthrough.
SELECT email, COUNT(*) AS dup_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY dup_count DESC;Run that first. If it returns zero rows, there's nothing to delete and you can skip this article. If it returns rows, write the count down. That number is your sanity check after the delete.
Jump to:
- How do I delete duplicate rows in MySQL?
- Method 1: DELETE with self-join (all MySQL versions)
- Method 2: DELETE with ROW_NUMBER and a CTE (MySQL 8.0+)
- Method 3: Temp-table swap (safest for huge tables)
- Why DELETE ... WHERE id NOT IN (SELECT MIN(id) ...) fails
- Always dry-run first
- Wrap the delete in a transaction
- Add the UNIQUE constraint afterwards
- FAQ
How do I delete duplicate rows in MySQL?
On MySQL 8.0 and 8.4 the cleanest deletion uses a ROW_NUMBER() CTE: WITH ranked AS (SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn FROM customers) DELETE FROM customers WHERE id IN (SELECT id FROM ranked WHERE rn > 1);. Read it as: number each row inside its duplicate group with the lowest id getting 1, then delete every row numbered 2 or higher. Swap ORDER BY id for ORDER BY created_at DESC to keep the newest row instead of the oldest. On MySQL 5.7 and earlier where window functions do not exist, use the self-join form DELETE a FROM customers a JOIN customers b ON a.email = b.email AND a.id > b.id;. Always run the production delete inside a transaction (START TRANSACTION; ...; COMMIT; or ROLLBACK), and finish by adding a UNIQUE constraint so the duplicates cannot grow back.
Method 1: DELETE with self-join (all MySQL versions)
The self-join deletion is the one to use on MySQL 5.5, 5.6, and 5.7 where you do not have window functions. The query keeps the row with the smallest id in each duplicate group and deletes the rest.
DELETE a
FROM customers a
JOIN customers b
ON a.email = b.email
AND a.id > b.id;The join condition a.id > b.id is the key. For every pair of rows that share the same :column value, the row with the higher id is the one MySQL deletes. The row with the lowest id is what stays. Swap to a.id < b.id if you want to keep the most recent rows instead.
This pattern works on MySQL 5.0 and every version since. Performance is the catch: on a table with no index on the duplicate-key column, the self-join is O(n²) and gets slow fast. Always have an index on :column before running this; the same index you would add anyway to support the unique constraint at the end.
Method 2: DELETE with ROW_NUMBER and a CTE (MySQL 8.0+)
On MySQL 8.0 and newer, including 8.1, 8.2, 8.3, and 8.4, a ROW_NUMBER() CTE is the readable form. Each row in the table gets a number inside its duplicate group, and the DELETE removes every row with a number above 1.
WITH ranked AS (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM customers
)
DELETE FROM customers
WHERE id IN (SELECT id FROM ranked WHERE rn > 1);Two things to call out about this pattern:
- The
ORDER BYinside the window function controls which row you keep.ORDER BY idkeeps the lowest id (typically the earliest).ORDER BY id DESCkeeps the highest (typically the most recent).ORDER BY created_atlets you pick by a real timestamp column when you have one. - The outer DELETE references the CTE through a subquery. MySQL 8.0 does support
DELETEdirectly against a CTE in some cases, but theWHERE id IN (...)form works on every 8.x version without surprises.
On MySQL 5.7 and earlier this query fails with a syntax error because window functions and the WITH clause did not exist before 8.0. Use Method 1 or Method 3 instead.
Method 3: Temp-table swap (safest for huge tables)
When the table has tens of millions of rows, foreign keys pointing at it, or both, an in-place DELETE is painful. It holds long locks, generates a lot of undo log, and any rollback unwinds row by row. The temp-table swap is the production-safe alternative.
-- 1. Create a copy of the table structure
CREATE TABLE :table_dedup LIKE customers;
-- 2. Insert one row per duplicate group (the keeper)
INSERT INTO :table_dedup
SELECT * FROM customers
WHERE id IN (
SELECT MIN(id) FROM customers GROUP BY email
);
-- 3. Swap the tables atomically
RENAME TABLE
customers TO :table_with_dupes,
:table_dedup TO customers;
-- 4. Verify the new table, then drop the old one
DROP TABLE :table_with_dupes;The RENAME TABLE is the safety mechanism. It is atomic at the metadata level, meaning queries in flight either hit the old table or the new one but never an inconsistent state. If something looks wrong after the swap, just rename back: RENAME TABLE :table TO :table_dedup, :table_with_dupes TO :table.
This approach does not work cleanly if you have foreign keys pointing at the table — the FKs still reference the original table name, and they will not survive the rename. For FK-heavy schemas, use Method 1 or Method 2 inside a transaction instead.
Comparison: which deletion method to pick
| Method | Works on | Best for | Watch out for |
|---|---|---|---|
| DELETE self-join | All versions | Mid-size tables, no FK issues | O(n²) without an index on the duplicate-key column |
| DELETE ROW_NUMBER CTE | 8.0 and newer | Most modern dedup; readable | None for typical tables |
| Temp-table swap | All versions | Huge tables, no incoming FKs | Foreign keys pointing at the table |
Why DELETE ... WHERE id NOT IN (SELECT MIN(id) ...) fails
The shape most people reach for first looks like this:
-- DOES NOT WORK: MySQL refuses self-referencing DELETE
DELETE FROM customers
WHERE id NOT IN (
SELECT MIN(id) FROM customers GROUP BY email
);On MySQL 5.x and 8.x both, this errors out with You can't specify target table ':table' for update in FROM clause. The rule comes from the SQL spec: a DELETE cannot read from the same table inside a subquery in the WHERE clause. The optimizer cannot guarantee the subquery sees a consistent snapshot once the deletion starts mutating the table.
There are three correct fixes:
- Wrap the subquery in another subquery so MySQL materialises the intermediate result first:
WHERE id NOT IN (SELECT id FROM (SELECT MIN(id) AS id FROM :table GROUP BY :column) AS keep). The extra layer is what gets MySQL to evaluate the inner result before the delete starts. This works on every version. - Use the self-join form (Method 1) which never references the table inside a subquery.
- Use the
ROW_NUMBER()CTE (Method 2) on MySQL 8.0+, which separates the ranking step from the delete step cleanly.
The DELETE ... USING syntax (Method 1) and the CTE form (Method 2) are the only two that are safe in the same statement. If you find yourself reaching for WHERE id NOT IN (SELECT ... FROM same_table), switch shapes.
Always dry-run first
Before any DELETE, run the same query as a SELECT to see exactly which rows it would touch. For Method 2 that looks like:
WITH ranked AS (
SELECT id, first_name, last_name, email,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM customers
)
SELECT * FROM ranked WHERE rn > 1;Count the rows. Compare against the count from the duplicate-finder query: number of duplicate groups times (group size minus 1) should equal what you see here. If the numbers do not line up, stop and read the data first.
Wrap the delete in a transaction
Always run the production DELETE inside a transaction so a botched query is one ROLLBACK away from undone. The pattern:
START TRANSACTION;
WITH ranked AS (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM customers
)
DELETE FROM customers
WHERE id IN (SELECT id FROM ranked WHERE rn > 1);
-- Inspect: how many rows are left?
SELECT COUNT(*) FROM customers;
-- Compare against expected: original row count minus number of dupes deleted.
-- If correct, COMMIT. If wrong, ROLLBACK.
COMMIT;
-- or:
-- ROLLBACK;The SELECT COUNT(*) inside the open transaction reads the post-delete state for your session only. Other sessions still see the pre-delete table until you COMMIT. This gives you a quiet inspection window before the change goes live.
Add the UNIQUE constraint afterwards
The whole point of cleaning up duplicates is so they do not happen again. Once the delete is committed, add a UNIQUE constraint on the column so the database itself rejects future duplicates.
ALTER TABLE customers
ADD CONSTRAINT uniq_:table_email UNIQUE (email);If this fails with a duplicate-key error, the dedup missed something. Re-run the duplicate-finder, identify what is still doubled up, and clean up the remaining cases (usually whitespace or case-difference duplicates that the byte-for-byte queries did not catch).
What to do next
Once the duplicates are gone and the unique constraint is in place, the natural follow-up is automating the audit so duplicates never grow back undetected. The same identification query from How to Find Duplicate Rows in MySQL makes a fine nightly monitoring check; wrap it in SELECT EXISTS(...) and alert when it returns 1.
If the duplicates came in through a CSV import, How to Import a CSV File Into MySQL covers the staging-table pattern that catches duplicates before they enter the main table; How to Export a MySQL Table to CSV is the matching export. If the cleanup needs a backup first (always a good idea before a DELETE that affects more than a handful of rows), How to Export or Back Up All MySQL Databases is the mysqldump --single-transaction --routines --events walkthrough.
If the dedup involves cross-table comparisons ("delete pending rows that already have a sent twin"), the anti-join patterns in How to Find Rows in One MySQL Table Not in Another translate directly into DELETE shapes. The foundation is How to JOIN Two MySQL Tables.
For schema work tied to the cleanup (adding a unique_key column derived from existing columns, widening a column to fit a larger value space), How to Add a Column to a MySQL Table, How to Change a MySQL Column Type, and the MySQL Field Types and Sizes reference cover those operations.
For scripting nightly dedup audits from the shell, Bash for loop and Bash while loop are the wrapper patterns.
For the full reference on DELETE, INSERT, UPDATE, transactions, and the rest of the operational SQL you reach for during data cleanup, the MySQL Cheat Sheet keeps the common syntax on one page.
External reference: the official MySQL documentation on the DELETE statement and common table expressions (CTEs).





