How to Delete Duplicate Rows in MySQL

Three reliable ways to delete duplicate rows in MySQL while keeping one row per group: a DELETE with self-join (works on every version), a DELETE with a ROW_NUMBER() CTE (MySQL 8.0 and up, the cleanest pattern), and a temp-table swap (the safest option when the table is huge or has foreign keys pointing at it). I'll walk all three with a dry-run step, a transaction wrapper, and the unique constraint to add after the cleanup so the duplicates do not come back.

The first time I had to run a deletion like this on a production table, I did it in a single statement with no transaction and no backup of the affected rows. The query was technically correct, but it removed 40,000 rows in one shot and I had no way to verify which copies it kept until much later. Since then, every dedup runs through the same checklist: identify, count, dry-run, transact, verify, add a constraint. Here are the three deletion patterns I reach for, mapped to that checklist.

The setup: identify before you delete

Before deleting anything, you need the list of duplicate row IDs that should go. How to Find Duplicate Rows in MySQL covers the three identification patterns in detail; the deletion queries below all assume a customers table with duplicate email values, the same example data used in the identification walkthrough.

SQL

SELECT email, COUNT(*) AS dup_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY dup_count DESC;

Run that first. If it returns zero rows, there's nothing to delete and you can skip this article. If it returns rows, write the count down. That number is your sanity check after the delete.

Try it with your own values

Table nameDuplicate-key column

Jump to:

How do I delete duplicate rows in MySQL?
Method 1: DELETE with self-join (all MySQL versions)
Method 2: DELETE with ROW_NUMBER and a CTE (MySQL 8.0+)
Method 3: Temp-table swap (safest for huge tables)
Why DELETE ... WHERE id NOT IN (SELECT MIN(id) ...) fails
Always dry-run first
Wrap the delete in a transaction
Add the UNIQUE constraint afterwards
FAQ

How do I delete duplicate rows in MySQL?

On MySQL 8.0 and 8.4 the cleanest deletion uses a ROW_NUMBER() CTE: WITH ranked AS (SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn FROM customers) DELETE FROM customers WHERE id IN (SELECT id FROM ranked WHERE rn > 1);. Read it as: number each row inside its duplicate group with the lowest id getting 1, then delete every row numbered 2 or higher. Swap ORDER BY id for ORDER BY created_at DESC to keep the newest row instead of the oldest. On MySQL 5.7 and earlier where window functions do not exist, use the self-join form DELETE a FROM customers a JOIN customers b ON a.email = b.email AND a.id > b.id;. Always run the production delete inside a transaction (START TRANSACTION; ...; COMMIT; or ROLLBACK), and finish by adding a UNIQUE constraint so the duplicates cannot grow back.

Method 1: DELETE with self-join (all MySQL versions)

The self-join deletion is the one to use on MySQL 5.5, 5.6, and 5.7 where you do not have window functions. The query keeps the row with the smallest id in each duplicate group and deletes the rest.

SQL

DELETE a
FROM customers a
JOIN customers b
ON a.email = b.email
AND a.id > b.id;

The join condition a.id > b.id is the key. For every pair of rows that share the same :column value, the row with the higher id is the one MySQL deletes. The row with the lowest id is what stays. Swap to a.id < b.id if you want to keep the most recent rows instead.

This pattern works on MySQL 5.0 and every version since. Performance is the catch: on a table with no index on the duplicate-key column, the self-join is O(n²) and gets slow fast. Always have an index on :column before running this; the same index you would add anyway to support the unique constraint at the end.

Method 2: DELETE with ROW_NUMBER and a CTE (MySQL 8.0+)

On MySQL 8.0 and newer, including 8.1, 8.2, 8.3, and 8.4, a ROW_NUMBER() CTE is the readable form. Each row in the table gets a number inside its duplicate group, and the DELETE removes every row with a number above 1.

SQL

WITH ranked AS (
SELECT id,
       ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM customers
)
DELETE FROM customers
WHERE id IN (SELECT id FROM ranked WHERE rn > 1);

Two things to call out about this pattern:

The ORDER BY inside the window function controls which row you keep. ORDER BY id keeps the lowest id (typically the earliest). ORDER BY id DESC keeps the highest (typically the most recent). ORDER BY created_at lets you pick by a real timestamp column when you have one.
The outer DELETE references the CTE through a subquery. MySQL 8.0 does support DELETE directly against a CTE in some cases, but the WHERE id IN (...) form works on every 8.x version without surprises.

On MySQL 5.7 and earlier this query fails with a syntax error because window functions and the WITH clause did not exist before 8.0. Use Method 1 or Method 3 instead.

Method 3: Temp-table swap (safest for huge tables)

When the table has tens of millions of rows, foreign keys pointing at it, or both, an in-place DELETE is painful. It holds long locks, generates a lot of undo log, and any rollback unwinds row by row. The temp-table swap is the production-safe alternative.

SQL

-- 1. Create a copy of the table structure
CREATE TABLE :table_dedup LIKE customers;

-- 2. Insert one row per duplicate group (the keeper)
INSERT INTO :table_dedup
SELECT * FROM customers
WHERE id IN (
SELECT MIN(id) FROM customers GROUP BY email
);

-- 3. Swap the tables atomically
RENAME TABLE
customers TO :table_with_dupes,
:table_dedup TO customers;

-- 4. Verify the new table, then drop the old one
DROP TABLE :table_with_dupes;

The RENAME TABLE is the safety mechanism. It is atomic at the metadata level, meaning queries in flight either hit the old table or the new one but never an inconsistent state. If something looks wrong after the swap, just rename back: RENAME TABLE :table TO :table_dedup, :table_with_dupes TO :table.

This approach does not work cleanly if you have foreign keys pointing at the table — the FKs still reference the original table name, and they will not survive the rename. For FK-heavy schemas, use Method 1 or Method 2 inside a transaction instead.

Comparison: which deletion method to pick

Method	Works on	Best for	Watch out for
DELETE self-join	All versions	Mid-size tables, no FK issues	O(n²) without an index on the duplicate-key column
DELETE ROW_NUMBER CTE	8.0 and newer	Most modern dedup; readable	None for typical tables
Temp-table swap	All versions	Huge tables, no incoming FKs	Foreign keys pointing at the table

Why DELETE ... WHERE id NOT IN (SELECT MIN(id) ...) fails

The shape most people reach for first looks like this:

SQL

-- DOES NOT WORK: MySQL refuses self-referencing DELETE
DELETE FROM customers
WHERE id NOT IN (
SELECT MIN(id) FROM customers GROUP BY email
);

On MySQL 5.x and 8.x both, this errors out with You can't specify target table ':table' for update in FROM clause. The rule comes from the SQL spec: a DELETE cannot read from the same table inside a subquery in the WHERE clause. The optimizer cannot guarantee the subquery sees a consistent snapshot once the deletion starts mutating the table.

There are three correct fixes:

Wrap the subquery in another subquery so MySQL materialises the intermediate result first: WHERE id NOT IN (SELECT id FROM (SELECT MIN(id) AS id FROM :table GROUP BY :column) AS keep). The extra layer is what gets MySQL to evaluate the inner result before the delete starts. This works on every version.
Use the self-join form (Method 1) which never references the table inside a subquery.
Use the ROW_NUMBER() CTE (Method 2) on MySQL 8.0+, which separates the ranking step from the delete step cleanly.

The multi-table DELETE ... JOIN syntax (Method 1) and the CTE form (Method 2) are the only two that are safe in the same statement. If you find yourself reaching for WHERE id NOT IN (SELECT ... FROM same_table), switch shapes.

Always dry-run first

Before any DELETE, run the same query as a SELECT to see exactly which rows it would touch. For Method 2 that looks like:

SQL

WITH ranked AS (
SELECT id, first_name, last_name, email,
       ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM customers
)
SELECT * FROM ranked WHERE rn > 1;

Count the rows. Compare against the count from the duplicate-finder query: number of duplicate groups times (group size minus 1) should equal what you see here. If the numbers do not line up, stop and read the data first.

Wrap the delete in a transaction

Always run the production DELETE inside a transaction so a botched query is one ROLLBACK away from undone. The pattern:

SQL

START TRANSACTION;

WITH ranked AS (
SELECT id,
       ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM customers
)
DELETE FROM customers
WHERE id IN (SELECT id FROM ranked WHERE rn > 1);

-- Inspect: how many rows are left?
SELECT COUNT(*) FROM customers;

-- Compare against expected: original row count minus number of dupes deleted.
-- If correct, COMMIT. If wrong, ROLLBACK.
COMMIT;
-- or:
-- ROLLBACK;

The SELECT COUNT(*) inside the open transaction reads the post-delete state for your session only. Other sessions still see the pre-delete table until you COMMIT. This gives you a quiet inspection window before the change goes live.

Add the UNIQUE constraint afterwards

The whole point of cleaning up duplicates is so they do not happen again. Once the delete is committed, add a UNIQUE constraint on the column so the database itself rejects future duplicates.

SQL

ALTER TABLE customers
ADD CONSTRAINT uniq_:table_email UNIQUE (email);

If this fails with a duplicate-key error, the dedup missed something. Re-run the duplicate-finder, identify what is still doubled up, and clean up the remaining cases (usually whitespace or case-difference duplicates that the byte-for-byte queries did not catch).

What to do next

Once the duplicates are gone and the unique constraint is in place, the natural follow-up is automating the audit so duplicates never grow back undetected. The same identification query from How to Find Duplicate Rows in MySQL makes a fine nightly monitoring check; wrap it in SELECT EXISTS(...) and alert when it returns 1.

If the duplicates came in through a CSV import, How to Import a CSV File Into MySQL covers the staging-table pattern that catches duplicates before they enter the main table; How to Export a MySQL Table to CSV is the matching export. If the cleanup needs a backup first (always a good idea before a DELETE that affects more than a handful of rows), How to Export or Back Up All MySQL Databases is the mysqldump --single-transaction --routines --events walkthrough.

If the dedup involves cross-table comparisons ("delete pending rows that already have a sent twin"), the anti-join patterns in How to Find Rows in One MySQL Table Not in Another translate directly into DELETE shapes. The foundation is How to JOIN Two MySQL Tables.

For schema work tied to the cleanup (adding a unique_key column derived from existing columns, widening a column to fit a larger value space), How to Add a Column to a MySQL Table, How to Change a MySQL Column Type, and the MySQL Field Types and Sizes reference cover those operations.

For scripting nightly dedup audits from the shell, Bash for loop and Bash while loop are the wrapper patterns.

For the full reference on DELETE, INSERT, UPDATE, transactions, and the rest of the operational SQL you reach for during data cleanup, the MySQL Cheat Sheet keeps the common syntax on one page.

External reference: the official MySQL documentation on the DELETE statement and common table expressions (CTEs).

FAQ

Not immediately. For an InnoDB table, the removed rows are marked as free space inside the existing data file, but the operating system still sees the file as the same size.

To reclaim disk space, run OPTIMIZE TABLE customers; after the delete. That rebuilds the table and shrinks the underlying file. On large tables, OPTIMIZE TABLE takes time and locks the table, so plan it during a maintenance window.

No. Window functions and CTEs (WITH clauses) were introduced in MySQL 8.0. On 5.7 and earlier you must use the self-join deletion in Method 1, or the temp-table swap in Method 3.

Two cases. If the FKs are ON DELETE CASCADE, removing a duplicate row also removes the rows that reference it, which is almost never what you want for dedup.

If the FKs are ON DELETE RESTRICT (the default), the DELETE fails for any duplicate row that has children. The right pattern is to first re-point the children at the keeper row, then delete the duplicate. The temp-table swap method does not handle this either; for FK-heavy schemas you need a per-row re-point script.

Change the ORDER BY inside the window function. ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) assigns rn = 1 to the most recent row, so the rest get rn > 1 and are deleted.

For the self-join Method 1, flip the inequality: a.id < b.id deletes the lower id (the older row), keeping the higher id (the newer one). This assumes ids are monotonically increasing, which is the default for AUTO_INCREMENT columns.

On very large tables, yes. A single DELETE of millions of rows holds locks for the duration of the operation and generates large undo logs. Breaking the work into batches of 10,000 to 50,000 rows lets the server commit and breathe between batches.

The pattern is a LIMIT on the DELETE inside a loop. The temp-table swap in Method 3 is the alternative for tables too large to batch-delete in a reasonable window.

MySQL does not let a DELETE read from its own target table inside a WHERE subquery. The error is You can't specify target table 'x' for update in FROM clause. The optimizer cannot guarantee the subquery sees a consistent snapshot once the delete starts mutating rows.

Three fixes. Wrap the subquery in an extra layer (WHERE id NOT IN (SELECT id FROM (SELECT MIN(id) AS id FROM x GROUP BY col) AS keep)) so MySQL materialises it first. Or use the self-join DELETE a FROM x a JOIN x b ON ... AND a.id > b.id shape, which never references the table inside a subquery. Or on MySQL 8.0+, use the ROW_NUMBER() CTE form, which separates the ranking from the delete cleanly.

Rank by a custom expression in the window function. ROW_NUMBER() OVER (PARTITION BY email ORDER BY (CASE WHEN phone IS NULL THEN 1 ELSE 0 END), id) sorts rows with phone filled in (NULL=0) before rows with phone NULL (NULL=1), so the keeper (rn = 1) is the more complete row.

For multiple columns, add more terms to the ORDER BY: ORDER BY (phone IS NULL), (address IS NULL), id. The keeper becomes "row with the fewest NULL columns, breaking ties by lowest id". This is the pattern for merging messy contact-list duplicates without losing data.

How to Delete Duplicate Rows in MySQL

The setup: identify before you delete

How do I delete duplicate rows in MySQL?

Method 1: DELETE with self-join (all MySQL versions)

Method 2: DELETE with ROW_NUMBER and a CTE (MySQL 8.0+)

Method 3: Temp-table swap (safest for huge tables)

Comparison: which deletion method to pick

Why DELETE ... WHERE id NOT IN (SELECT MIN(id) ...) fails

Always dry-run first

Wrap the delete in a transaction

Add the UNIQUE constraint afterwards

What to do next

FAQ

Ishan Karunaratne

Related posts

How to Find Duplicate Rows in MySQL

How to Delete a User on Linux (userdel)

How to Store a Phone Number in MySQL

Will DELETE in MySQL release disk space after removing duplicates?

Can I run DELETE with ROW_NUMBER on MySQL 5.7?

What happens if foreign keys point at the table?

How do I keep the newest row instead of the oldest?

Is it safer to delete in batches?

Why does DELETE ... WHERE id NOT IN (SELECT MIN(id) FROM same_table) fail?

How do I delete duplicates while keeping the row with the most filled-in columns?

Ishan Karunaratne