On-Premises Deduplication: Keeping Sensitive Data Inside Your Network

On-Premises Deduplication: Keeping Sensitive Data Inside Your Network

January 20, 2026

For most SaaS tools, the tradeoff is implicit: you get the convenience of a managed service, but your data passes through the vendor’s infrastructure. For a CRM with 100,000 marketing contacts, that may be perfectly acceptable. For a healthcare organisation processing patient records, or a financial institution managing account data subject to strict regulatory controls, it often isn’t.

DeDuplica is designed with this constraint in mind.

The Architecture of the Local Agent

When you deploy a DeDuplica local agent, the data processing happens in your environment. The agent connects directly to your database (SQL Server, PostgreSQL, MySQL, MariaDB, Oracle, or Dynamics 365) using a connection you configure locally. It retrieves data, applies the matching rules, identifies duplicates, and — if configured to do so — processes them.

The DeDuplica cloud service handles:

  • Scheduling job runs
  • Storing job configuration (not your data)
  • Storing run results and statistics (record counts, duplicate counts, run durations)
  • Serving the management interface

Your actual database records — customer names, contact details, financial account data — never leave your network. The agent communicates with the cloud service via outbound HTTPS for coordination; your data travels only between the agent and your database, both of which are inside your perimeter.

Regulatory Drivers

GDPR and UK GDPR — If personal data must not be transferred outside the EU/EEA (or UK), or outside country-specific equivalents, cloud processing with a vendor whose servers are outside those jurisdictions may require transfer mechanisms or contractual safeguards. A local agent removes the question entirely for the data itself.

HIPAA — Healthcare organisations processing protected health information (PHI) need safeguards over where that PHI is processed. Local processing keeps PHI out of third-party infrastructure.

Financial services regulations — PCI DSS and similar frameworks restrict where cardholder data may flow. Many financial services regulations require explicit data residency controls. Local agent processing keeps regulated data on-premises.

Contractual data handling clauses — Enterprise contracts with large customers often include data handling restrictions that prohibit sharing customer data with sub-processors. A local agent provides an architectural answer to these clauses.

Multi-Agent Deployments

Large organisations often have multiple environments that need deduplication coverage — production, staging, regional instances, or data in separate network segments (perhaps for data sovereignty reasons across country offices).

DeDuplica supports multiple agents on a single account. Each agent registers independently, connects to the databases it has access to, and can be assigned to different jobs. A European data centre runs an agent against the EU customer database; a North American data centre runs a separate agent against the US customer database. Both agents are managed from the same DeDuplica account, with the same job configuration interface, but no data crosses regional boundaries.

The number of agents available depends on plan; see the plans page for current limits.

What “Inside Your Network” Actually Means

It is worth being precise. When we say data stays inside your network:

  • The agent binary runs on a server you provide (Docker host on Linux)
  • The agent connects to your database using credentials you control
  • Query results are held in agent memory during processing and written to your local database for deduplication; they are not transmitted to DeDuplica’s cloud
  • The cloud service receives only job execution metadata: start time, end time, row counts, duplicate counts, status codes and encrypted outputs - with keys you control.

The agent communicates outbound to DeDuplica through Azure storage account you control. This outbound connection is the only traffic leaving your network relating to DeDuplica.

Getting Started With Local Agent Deployment

Agent deployment is documented in the system settings guide. The agent is distributed as a standalone Docker image; no container runtime or cloud credentials are required. Basic deployment on a single server takes under an hour.

For air-gapped environments or environments with strict egress controls, contact the DeDuplica team to discuss deployment options.


The local agent is available on Plus and Enterprise plans. Compare plans or start your free trial.