sqlcmd.net validated sql reference
advanced deduplication MySQL MariaDB SQL Server PostgreSQL SQLite

Delete Duplicate Rows While Keeping One

Use `ROW_NUMBER()` to identify duplicate rows and delete every copy except the one you want to keep.

Docker-validated Not currently validation-green

Delete duplicate email rows and keep the earliest id

Each email group is partitioned together and ordered by id, so the earliest row gets rn = 1. Rows with rn > 1 are treated as duplicates and deleted. That keeps Ada One, Bob One, and Cara One while removing the later duplicates.

MySQL MariaDB SQLite
Engine-specific syntax
Setup
CREATE TABLE contacts (id INT, email VARCHAR(100), name VARCHAR(50));

INSERT INTO
  contacts (id, email, name)
VALUES
  (1, '[email protected]', 'Ada One'),
  (2, '[email protected]', 'Bob One'),
  (3, '[email protected]', 'Ada Two'),
  (4, '[email protected]', 'Cara One'),
  (5, '[email protected]', 'Bob Two');
SQL
DELETE FROM contacts
WHERE
  id IN (
    SELECT
      id
    FROM
      (
        SELECT
          id,
          ROW_NUMBER() OVER (
            PARTITION BY
              email
            ORDER BY
              id
          ) AS rn
        FROM
          contacts
      ) ranked
    WHERE
      rn > 1
  );

SELECT
  id,
  email,
  name
FROM
  contacts
ORDER BY
  id;
idemailname
1[email protected]Ada One
2[email protected]Bob One
4[email protected]Cara One
SQL Server
Engine-specific syntax
Setup
CREATE TABLE contacts (id INT, email VARCHAR(100), name VARCHAR(50));

INSERT INTO
  contacts (id, email, name)
VALUES
  (1, '[email protected]', 'Ada One'),
  (2, '[email protected]', 'Bob One'),
  (3, '[email protected]', 'Ada Two'),
  (4, '[email protected]', 'Cara One'),
  (5, '[email protected]', 'Bob Two');
SQL
WITH
  ranked AS (
    SELECT
      id,
      ROW_NUMBER() OVER (
        PARTITION BY
          email
        ORDER BY
          id
      ) AS rn
    FROM
      contacts
  )
DELETE FROM ranked
WHERE
  rn > 1;

SELECT
  id,
  email,
  name
FROM
  contacts
ORDER BY
  id;
idemailname
1[email protected]Ada One
2[email protected]Bob One
4[email protected]Cara One
PostgreSQL
Engine-specific syntax
Setup
CREATE TABLE contacts (id INT, email VARCHAR(100), name VARCHAR(50));

INSERT INTO
  contacts (id, email, name)
VALUES
  (1, '[email protected]', 'Ada One'),
  (2, '[email protected]', 'Bob One'),
  (3, '[email protected]', 'Ada Two'),
  (4, '[email protected]', 'Cara One'),
  (5, '[email protected]', 'Bob Two');
SQL
WITH
  ranked AS (
    SELECT
      id,
      ROW_NUMBER() OVER (
        PARTITION BY
          email
        ORDER BY
          id
      ) AS rn
    FROM
      contacts
  )
DELETE FROM contacts USING ranked
WHERE
  contacts.id = ranked.id
  AND ranked.rn > 1;

SELECT
  id,
  email,
  name
FROM
  contacts
ORDER BY
  id;
idemailname
1[email protected]Ada One
2[email protected]Bob One
4[email protected]Cara One

The deduplication logic is the same across engines, but the delete statement shape differs enough to keep the SQL split out.

Where this command helps.

  • removing repeated imported rows while preserving the earliest record
  • cleaning up duplicate emails or external ids before adding a unique constraint

What the command is doing.

After finding duplicate values, the next step is usually cleanup. A common pattern is to partition rows by the duplicate key, assign ROW_NUMBER() ordered by the row you want to preserve, then delete every row where the row number is greater than 1. This keeps one canonical row per duplicate group and removes the rest.