How to Find Duplicate Rows in MySQL

Three reliable ways to find duplicate rows in MySQL: a GROUP BY ... HAVING query (the classic, works on every version), a ROW_NUMBER() window function (MySQL 8.0 and up, cleanest setup if you plan to delete the duplicates next), and a self-join on the duplicate key (useful when you need every individual row, not just one per group). Pick the one that matches your MySQL version and what you plan to do with the result.

One of the first messy tables I ever inherited was a customers table that had been written without a unique constraint on email. After a few years of CSV imports and re-imports from various marketing tools, the same address showed up dozens of times. Before I could add the constraint, I needed to figure out which rows were the duplicates and which copy to keep. This is the SQL I have come back to every couple of years since, in three flavours, with the MySQL version each one needs spelled out so the right query for your server is obvious.

The example table

A small customers table is enough to demonstrate every technique. A few rows are deliberate duplicates on email, plus one with trailing whitespace and one with a different case so we can talk about soft duplicates later.

SQL

CREATE TABLE customers (
  id          INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  first_name  VARCHAR(50)  NOT NULL,
  last_name   VARCHAR(50)  NOT NULL,
  email       VARCHAR(255) NOT NULL,
  phone       VARCHAR(20),
  created_at  DATETIME     NOT NULL DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO customers (first_name, last_name, email, phone) VALUES
('Mei',    'Tanaka',   'mei.tanaka@example.com',     '555-0142'),
('Mei',    'Tanaka',   'mei.tanaka@example.com',     '555-0142'),
('James',  'Walker',   'james.walker@example.com',   NULL),
('Emma',   'Carter',   'emma.carter@example.com',    '555-0118'),
('Emma',   'Carter',   ' emma.carter@example.com ',  '555-0118'),
('Sofia',  'Marek',    'sofia.marek@example.com',    '555-0177'),
('Sofia',  'Marek',    'SOFIA.MAREK@example.com',    '555-0177'),
('Daniel', 'Brooks',   'daniel.brooks@example.com',  NULL),
('Daniel', 'Brooks',   'daniel.brooks@example.com',  NULL),
('Lukas',  'Becker',   'lukas.becker@example.com',   '555-0102');

Try it with your own values

Table nameColumn to check

Jump to:

How do I find duplicate rows in MySQL?
Method 1: GROUP BY with HAVING
Method 2: Window functions (MySQL 8.0+)
Method 3: Self-join
Comparison: which method to pick
How NULL behaves in duplicate checks
Soft duplicates: case and whitespace
Performance and the index that matters
FAQ

How do I find duplicate rows in MySQL?

Three queries, pick by what you need next. For a quick audit ("how many duplicates do I have?"), use GROUP BY ... HAVING COUNT(*) > 1: SELECT email, COUNT(*) AS dup_count FROM customers GROUP BY email HAVING COUNT(*) > 1 ORDER BY dup_count DESC;. Works on every MySQL version, returns one row per duplicate value with its count. To get every individual duplicate row tagged with a position so the next step is a DELETE, use the MySQL 8.0+ window function form: SELECT * FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn FROM customers) ranked WHERE rn > 1;. On MySQL 5.7 and earlier where window functions do not exist, use the self-join: SELECT a.* FROM customers a JOIN customers b ON a.email = b.email AND a.id != b.id. Add an index on the duplicate-key column before running any of these on a large table; without it, Method 1 is full-scan, the window function buffers in memory, and the self-join goes quadratic.

Method 1: GROUP BY with HAVING

The shortest correct query. Works in MySQL 5.5, 5.6, 5.7, 8.0, and 8.4 (and anything older still in production).

SQL

SELECT email, COUNT(*) AS dup_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY dup_count DESC;

Two things to read out of the result:

Each row is a duplicate value, with a count of how many rows in the table carry it.
The line that does the filtering is HAVING COUNT(*) > 1. A common first guess is to write WHERE COUNT(*) > 1 instead, but that fails with an error because WHERE runs before GROUP BY, so the aggregate COUNT(*) does not exist yet at that point. HAVING is the filter that runs after aggregation, which is when aggregates are available.

To find duplicates across more than one column, list every column you care about in the GROUP BY:

Replace in this example:Columns to group on

SQL

SELECT first_name, last_name, email, COUNT(*) AS dup_count
FROM customers
GROUP BY first_name, last_name, email
HAVING COUNT(*) > 1;

Why are the same columns repeated in SELECT and GROUP BY? MySQL requires every non-aggregate column in SELECT to also appear in GROUP BY. This is the default since MySQL 5.7.5 (via ONLY_FULL_GROUP_BY in sql_mode). The reason: GROUP BY collapses many rows into one group, so MySQL needs to know which value to use for each column. The columns you grouped on are guaranteed identical within a group; columns you didn't group on are ambiguous, so the query errors out rather than picking arbitrarily.

This is the one to start with for an audit. It tells you how many duplicate groups exist and how bad the worst offenders are. What it does not give you is the row id of each individual duplicate; for that, see Method 2 or 3.

Method 2: Window functions (MySQL 8.0+)

If you are on MySQL 8.0 (released April 2018) or newer, including 8.1, 8.2, 8.3, and 8.4, the cleanest query uses ROW_NUMBER():

SQL

SELECT id, first_name, last_name, email, rn
FROM (
SELECT id, first_name, last_name, email,
       ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM customers
) AS ranked
WHERE rn > 1;

Line by line: the inner query gives every row a number starting at 1 inside its email group, with the smallest id getting 1. The outer query keeps only the rows numbered 2 or higher, which is exactly "every duplicate, but not the one you would keep". This is the form you want when the next step is to remove the duplicates: How to Delete Duplicate Rows in MySQL shows how to wrap this pattern in a DELETE with safety checks, dry-run verification, and rollback.

Swap ORDER BY id for ORDER BY created_at if you want the oldest row in each group to be the keeper; swap to ORDER BY id DESC to keep the newest. The PARTITION BY clause is what makes the numbering restart for each group.

This query will fail on MySQL 5.5, 5.6, and 5.7 with a syntax error. Window functions were the headline feature of 8.0; before that, you need Method 1 or Method 3.

Method 3: Self-join

When you need every duplicate row with its full set of columns, and you are on MySQL 5.5, 5.6, 5.7, or any version that does not yet have window functions, join the table to itself on the duplicate key. The same query also works on 8.0 and up.

SQL

SELECT a.id, a.first_name, a.last_name, a.email
FROM customers a
JOIN customers b
ON a.email = b.email
AND a.id != b.id
ORDER BY a.email, a.id;

Every row in the result is one where another row exists with the same :column value but a different id. Conceptually similar to Method 2, but works on every MySQL version that supports JOIN (so, all of them).

The catch is performance: this is an O(n²) operation if MySQL has to scan the full table on each side of the join. On a 10,000 row table with no index on the join column you will wait. With a proper index it stays fast. More on that below.

Comparison: which method to pick

Method	Works on	Returns	Use when
GROUP BY + HAVING	All versions	One row per duplicate value, with count	Quick audit. "How bad is it."
ROW_NUMBER	8.0 and newer	Every duplicate row tagged with a position	You are about to DELETE or UPDATE the dupes
Self-join	All versions	Every duplicate row with full columns	You are on 5.7, or you want both rows of every pair

The other useful trick is EXISTS when all you need is a boolean ("does this table have any duplicates?"):

SQL

SELECT EXISTS(
SELECT 1
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
) AS has_duplicates;

Returns 1 if any duplicates exist, 0 if not. Useful in monitoring queries and pre-migration sanity checks.

How NULL behaves in duplicate checks

The edge case that bites every junior dev once. GROUP BY puts all NULL values into one bucket and counts them as duplicates of each other. So if your column is nullable and a few rows have NULL, the query above reports them as one duplicate group.

A UNIQUE constraint in MySQL does the opposite: it allows multiple NULL rows in a unique column, because two NULLs are not considered equal under the constraint. So you can have a table where "find duplicates" reports a NULL group, but ALTER TABLE ... ADD UNIQUE still succeeds. Both behaviours are correct, and both have caught me at least once.

If you want the duplicate-finding query to skip NULLs entirely, add a WHERE clause:

SQL

SELECT email, COUNT(*) AS dup_count
FROM customers
WHERE email IS NOT NULL
GROUP BY email
HAVING COUNT(*) > 1;

If you want NULLs treated as their own value but distinct from each other (so they never collide), there is no clean SQL for it. You normally handle that at the application layer, or by normalising NULLs to a sentinel value before the audit.

Soft duplicates: case and whitespace

Look at the sample data again. ' emma.carter@example.com ' has a leading and trailing space; 'SOFIA.MAREK@example.com' is the same address as Sofia's other row in different case. Method 1 will not flag either as duplicates because byte-for-byte they are different strings.

To catch case differences, normalise both sides:

SQL

SELECT LOWER(email) AS norm, COUNT(*) AS dup_count
FROM customers
GROUP BY LOWER(email)
HAVING dup_count > 1;

To catch whitespace:

SQL

SELECT TRIM(email) AS norm, COUNT(*) AS dup_count
FROM customers
GROUP BY TRIM(email)
HAVING dup_count > 1;

Combine them when you suspect both:

SQL

SELECT LOWER(TRIM(email)) AS norm, COUNT(*) AS dup_count
FROM customers
GROUP BY norm
HAVING dup_count > 1;

There is a real cost here: wrapping the column in LOWER() or TRIM() makes MySQL unable to use an index on email, because the indexed values are not the function outputs. On large tables this turns a millisecond query into a multi-second one. Two ways out:

Fix the data once, lock it down. Run a single migration that lowercases and trims every existing row, then apply normalisation in the application layer going forward.
Use a case-insensitive collation on the column (for example utf8mb4_0900_ai_ci, the default in 8.0+). The collation handles case automatically and the index is still used.

I default to option 1 unless the column already has a _ci collation. For more complex matching where LOWER() and TRIM() aren't enough — pattern-style fuzzy detection like "addresses that differ only in subdomain" or "values that contain a specific substring" — MySQL's REGEXP operator lets you run regular expressions against column values; the Regex Cheat Sheet covers the pattern syntax that works inside SQL.

Performance and the index that matters

The single most important optimisation for any of these queries is an index on the column you are grouping or partitioning by.

SQL

CREATE INDEX idx_:table_email ON customers (email);

EXPLAIN before:

code

+----+-------------+-----------+------+---------------+-------+
| id | select_type | table     | type | possible_keys | rows  |
+----+-------------+-----------+------+---------------+-------+
|  1 | SIMPLE      | customers | ALL  | NULL          | 80213 |
+----+-------------+-----------+------+---------------+-------+

EXPLAIN after:

code

+----+-------------+-----------+-------+---------------------+-------+
| id | select_type | table     | type  | possible_keys       | rows  |
+----+-------------+-----------+-------+---------------------+-------+
|  1 | SIMPLE      | customers | index | idx_customers_email | 80213 |
+----+-------------+-----------+-------+---------------------+-------+

type: ALL is a full table scan. type: index is a fast index walk that hands the grouped values to MySQL pre-sorted. On the 80k-row customers table I started with, adding the index dropped the GROUP BY HAVING query from about 380 ms to under 20 ms on cold caches.

For the window-function version, a covering index that includes both the partition column and the order column lets MySQL avoid sorting in memory:

SQL

CREATE INDEX idx_:table_:column_id ON customers (email, id);

For the self-join version, the index on the join column is mandatory in practice. Without it MySQL does a nested-loop join that scales quadratically with table size, and the query goes from fast to "I'll come back tomorrow".

What to do next

Now that you have the list of duplicate rows, the natural next step is removing the extras while keeping one row per group. The same ROW_NUMBER pattern from Method 2 wraps cleanly into a DELETE, including the safest way to do it on a table with foreign keys: the full deletion walkthrough with dry-run, transaction wrapping, and adding the unique constraint afterwards.

If the duplicates are spread across two tables (rows in table A that have no equivalent in table B, or vice versa), How to Find Rows in One MySQL Table Not in Another covers the anti-join patterns. The foundation for understanding why a JOIN can synthesise duplicate rows in the first place is How to JOIN Two MySQL Tables.

If the duplicates arrived through a CSV import, How to Import a CSV File Into MySQL covers the staging-table pattern that prevents them from entering the main table in the first place; How to Export a MySQL Table to CSV is the matching export walkthrough. For a full backup before running any cleanup, How to Export or Back Up All MySQL Databases is the mysqldump --single-transaction --routines --events pattern.

For schema changes around the dedup (adding the column the unique key will live on, widening a key column to fit a deduplicated string), How to Add a Column to a MySQL Table, How to Change a MySQL Column Type, and the MySQL Field Types and Sizes reference cover those operations. If the duplicates involve DATETIME rounding, How to Format a MySQL DATETIME in a SELECT explains the precision differences.

For nightly duplicate audits driven from a cron job, Bash for loop and Bash while loop are the wrapper patterns.

For the wider MySQL command reference covering logins, dumps, joins, indexes, and string functions in one place, see the MySQL Cheat Sheet.

External references: official MySQL docs on GROUP BY and HAVING and window function syntax.

FAQ

GROUP BY groups every NULL into one bucket and counts them as duplicates of each other. So a column with three NULL rows shows up as a duplicate group of size 3.

A UNIQUE constraint behaves differently: it allows multiple NULL rows because two NULLs are not considered equal under the constraint. To make the duplicate query skip NULLs entirely, add WHERE col IS NOT NULL.

DISTINCT returns the unique values in a column. It cannot filter to only the values that appear more than once.

GROUP BY ... HAVING COUNT(*) > 1 returns only the duplicated values, with a count of how many times each one appears. That is what you usually want when auditing for duplicates.

No. Window functions including ROW_NUMBER(), RANK(), and DENSE_RANK() were introduced in MySQL 8.0 (released April 2018). On MySQL 5.7 and earlier they cause a syntax error.

Use GROUP BY ... HAVING for a value-level summary or a self-join when you need the full row of every duplicate.

Wrap the column in LOWER() in both the SELECT and GROUP BY: GROUP BY LOWER(email). This catches alice@example.com and ALICE@example.com as the same value.

The cost is that MySQL cannot use an index on the column when it is wrapped in a function. On large tables, use a case-insensitive collation (such as utf8mb4_0900_ai_ci, the default in 8.0+) so case is handled at the collation level and the index is still used.

Almost always it is leading or trailing whitespace, a tab character, or a case difference. The strings 'abc' and 'abc ' are different values to MySQL.

Run SELECT email, LENGTH(email), CHAR_LENGTH(email) FROM customers WHERE email LIKE '%abc%' to spot the offenders. If LENGTH exceeds the visible character count, you have invisible bytes. Use TRIM() in your GROUP BY, or do a one-time UPDATE to clean the column.

ROW_NUMBER() gives every row a unique sequential number inside its window, even for ties: rows numbered 1, 2, 3, 4, ... RANK() gives tied rows the same number but skips: 1, 1, 3, 4, ... DENSE_RANK() gives tied rows the same number without skipping: 1, 1, 2, 3, ...

For duplicate detection where the partition is the duplicate-key column, all three would assign 1 to the first row and a higher number to the rest, because the ordering column (typically id or created_at) is unique within the partition. ROW_NUMBER() is the conventional choice; it has the cleanest semantics and is the one MySQL documentation uses in its dedup examples.

If you mean "rows in table A that also exist in table B", that is a semi-join: SELECT a.* FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.key = a.key). If you mean the opposite ("rows in A that do not exist in B"), the anti-join walkthrough is at How to Find Rows in One MySQL Table Not in Another.

For a single duplicate-detection query that spans two tables, UNION the rows first, then run GROUP BY ... HAVING COUNT(*) > 1 on the union: SELECT email, COUNT(*) FROM (SELECT email FROM customers UNION ALL SELECT email FROM leads) AS combined GROUP BY email HAVING COUNT(*) > 1. Use UNION ALL (not UNION) so within-table duplicates are also counted.

How to Find Duplicate Rows in MySQL

The example table

How do I find duplicate rows in MySQL?

Method 1: GROUP BY with HAVING

Method 2: Window functions (MySQL 8.0+)

Method 3: Self-join

Comparison: which method to pick

How NULL behaves in duplicate checks

Soft duplicates: case and whitespace

Performance and the index that matters

What to do next

FAQ

Ishan Karunaratne

Related posts

How to Delete Duplicate Rows in MySQL

How to Import a CSV File Into MySQL

How to Store a Phone Number in MySQL

Does GROUP BY treat NULL as a duplicate value?

What is the difference between DISTINCT and GROUP BY for duplicates?

Does ROW_NUMBER() work in MySQL 5.7?

How do I find duplicates ignoring case?

Why does my query miss duplicates that look identical?

What is the difference between ROW_NUMBER, RANK, and DENSE_RANK for finding duplicates?

How do I find duplicate rows across multiple tables?

Ishan Karunaratne