How to Remove Duplicate Records from SQL Server, PostgreSQL and MySQL

How to Remove Duplicate Records from SQL Server, PostgreSQL and MySQL

March 31, 2026

Removing duplicate records from a relational database is one of the most common data quality tasks. This article covers both the manual SQL approach (for simple cases or one-off cleanups) and the systematic tool-based approach (for production databases with ongoing needs).

The Manual SQL Approach

Step 1: Identify Duplicates

Before deleting anything, find out what you’re dealing with. This query identifies groups of duplicate rows with identical values across specified columns:

-- Works in SQL Server, PostgreSQL, and MySQL
SELECT
    email,
    first_name,
    last_name,
    COUNT(*) AS duplicate_count
FROM contacts
GROUP BY email, first_name, last_name
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;

Replace the column names and table with your actual schema.

Step 2: Choose Which Row to Keep

Use a CTE with ROW_NUMBER() to rank duplicates and mark which ones to delete. In this example, we keep the oldest record (lowest id) and mark later ones for removal:

SQL Server / PostgreSQL:

WITH ranked AS (
    SELECT
        id,
        ROW_NUMBER() OVER (
            PARTITION BY email, first_name, last_name
            ORDER BY id ASC
        ) AS rn
    FROM contacts
)
DELETE FROM contacts
WHERE id IN (
    SELECT id FROM ranked WHERE rn > 1
);

MySQL (requires subquery workaround):

DELETE c FROM contacts c
INNER JOIN (
    SELECT MIN(id) AS keep_id, email, first_name, last_name
    FROM contacts
    GROUP BY email, first_name, last_name
    HAVING COUNT(*) > 1
) dup ON c.email = dup.email
    AND c.first_name = dup.first_name
    AND c.last_name = dup.last_name
    AND c.id != dup.keep_id;

Limitations of the Manual Approach

The SQL approach above handles exact duplicates — rows where specified columns are character-for-character identical. It does not handle:

  • Fuzzy duplicates: “Acme Corp” and “ACME Corporation” — different strings, same company
  • Partial duplicates: same person, different email addresses in different records
  • Multi-table scenarios: duplicate contacts linked to different accounts that should be merged
  • Related record handling: what happens to the orders, notes, or activities attached to the deleted row?

For exact-match deduplication of a simple table, the SQL approach is adequate. For anything more complex, a dedicated tool is more reliable and significantly safer.

The Systematic Tool-Based Approach

For production deduplication — especially with fuzzy matching, related records, or recurring runs — a dedicated tool provides workflow guardrails that raw SQL doesn’t.

Using DeDuplica

DeDuplica connects directly to SQL Server, PostgreSQL and MySQL without requiring any data export. The workflow:

1. Connect your database

In DeDuplica, go to Connections and add a new connection for your database engine. You’ll provide the host, port, database name, and a read/write user credential. DeDuplica only needs read access for the Find step; write access is required for the Process step.

See: SQL Server connection guide, PostgreSQL guide, MySQL guide.

2. Create a job with a Source Definition

Define which table to scan and which fields to use in matching. For a contacts table, you might configure:

  • email: Exact Comparator (high confidence signal)
  • last_name: Person Name Comparator, threshold 85%
  • first_name: Person Name Comparator, threshold 80%

Add filters to restrict scope if needed (e.g., WHERE created_date > '2024-01-01' for incremental processing).

See: source definition documentation.

3. Run a Find Duplicates job

The Find job scans your table, applies the matching rules, and stores identified duplicate groups. No records are modified. Review the output — you’ll see every identified pair or group with their similarity scores.

4. Review and optionally test first

If you’re not sure about your matching rule thresholds, use the Testing a Job feature to run against a 5,000-row sample first. Inspect the results. Adjust thresholds if you’re seeing too many false positives or too many missed duplicates.

5. Run a Process Duplicates job

Once satisfied with the Find results, run a Process Duplicates job. Configure:

  • Which record becomes the base (e.g., the oldest by creation date, or the one with the most related records)
  • How conflicting field values are resolved
  • What happens to related records attached to subordinates

The process job executes the merges or deletions (directly where supported or by webhooks), logs every action, and stores a full run report.

The Key Advantage: Separation of Find and Process

The most important difference from running a DELETE statement is that you see what you’re deleting before it happens. A raw SQL delete is immediate and permanent (unless you’re in a transaction). The DeDuplica workflow gives you a structured review step between identification and action.

For production databases with live users, this separation is not optional — it is how you avoid data loss.

Setting Up Recurring Deduplication

A one-off cleanup is a starting point. For databases in active use, duplicates will re-accumulate. Scheduling a recurring weekly or daily Find + Process job keeps the data clean continuously without manual intervention.

DeDuplica supports this through job scheduling. Schedule the Find job to run (e.g., every Monday at 03:00), and optionally schedule the Process job to automatically resolve high-confidence matches without waiting for manual review.

Quick Reference: Which Approach for Which Scenario

ScenarioApproach
One-time cleanup, exact match only, no related recordsManual SQL
One-time cleanup with fuzzy matchingDeDuplica Find + Process job
Recurring automated deduplicationDeDuplica with scheduling
Production database with related records to preserveDeDuplica (handles re-parenting)
Compliance-sensitive data that can’t leave networkDeDuplica with local agent

Connect your SQL Server, PostgreSQL, or MySQL database to DeDuplica and run your first deduplication scan for free. Start here — no credit card required.