What is DeDuplica

What is DeDuplica

DeDuplica is a cloud-based SaaS platform for detecting, reviewing, and resolving duplicate records in enterprise databases. It is designed for organisations that depend on accurate, consistent data — in CRM systems, operational databases, or data that flows between multiple integrated platforms.

Why Deduplication Matters

In most organisations, data arrives from multiple channels: manual entry, imports, integrations, migrations. Over time, the same real-world entity — a customer, a company, a product — ends up represented as multiple records. This leads to:

  • Unreliable reports and dashboards
  • Duplicate outreach and customer confusion
  • Failed integrations caused by conflicting records
  • Compliance and audit challenges

DeDuplica provides a structured, automated way to find and fix these duplicates — without requiring bespoke data engineering.

How It Works

DeDuplica works in three stages:

  1. Connect — point DeDuplica at your data source (a database, a Dynamics organisation, etc.)
  2. Configure — define a Job: which table to scan, which fields to compare, and how aggressively to match
  3. Act — review found duplicates, merge them automatically where supported, or push them through a webhook to your own processing pipeline

Core Concepts

Connections

A connection authenticates DeDuplica against a data source. Each connection stores the credentials and settings needed to read (and optionally write) data. Supported sources:

SourceReadAuto-merge
Microsoft Dynamics 365 / Dataverse
Microsoft SQL Servervia webhook
PostgreSQLvia webhook
MySQLvia webhook
MariaDBvia webhook
Oracle Databasevia webhook

Jobs

A Job is the primary unit of work. Each job targets one table in one connected source. A job consists of:

  • Source Definition — which table, which fields, optional filters
  • Find Duplicates — field-level matching rules, algorithms, and a strictness setting
  • Process Duplicates — how to compose the merged output (field-level merge strategy)
  • Action Duplicates — what to do when duplicates are found (webhook, auto-merge, notification)

Jobs can be run manually or on a schedule. Multiple jobs can run in parallel depending on your plan.

Duplicate Records

When a job finds a cluster of records that match above the configured threshold, a duplicate entry is created. Each cluster has one base (master) record and one or more subordinate records. Duplicates move through a lifecycle:

  • Pending — awaiting review or configured action
  • Completed — action taken (merged or webhook fired)
  • Cancelled — marked as not a real duplicate; will not be re-raised unless data changes
  • Locked — permanently excluded; no subordinate in that cluster will ever be linked to that base record again

Agents

By default, DeDuplica processes data via its cloud service. For organisations with data residency or compliance requirements, an Enterprise Agent can be deployed inside your own infrastructure. The agent handles all data access locally — DeDuplica’s cloud only orchestrates the workflow, never seeing the actual record data.

All data sent from a Local Agent is encrypted in transit. Enterprise customers can additionally configure a CLIENT_ENCRYPTION_KEY so that even DeDuplica’s backend cannot read matched record field values — only the customer’s own receiving infrastructure can decrypt the merge output.

See Local Agent setup for environment variable configuration, key requirements, and the self-serve key retrieval UI.

Webhooks

For databases where DeDuplica cannot write directly, webhooks deliver the duplicate cluster and the pre-built merge JSON to your endpoint. Your system then applies the merge. This enables automation for any database or downstream platform.

Supported Databases

All product names and logos are trademarks of their respective owners and are used for identification purposes only.

Next Steps