Local connection (Agent)
A Local Agent lets DeDuplica process your data entirely inside your own infrastructure. Raw records never leave your network — only job instructions come in and encrypted result metadata goes out. This is the standard deployment model for organisations with data residency requirements.
Local Agent connectivity is available on the Enterprise plan and supports all connection types offered by DeDuplica.
How It Works
DeDuplica uses a split-responsibility architecture:
- DeDuplica Cloud handles job configuration, scheduling, result storage, and the user interface.
- The Local Agent runs inside your network, picks up job work from a message queue, accesses your data sources directly, performs all record comparison and deduplication locally, and reports back only encrypted results.
You can set the agent so that your actual record data is never uploaded unencrypted. The agent reads from your databases, processes duplicates in memory within your environment, and only sends back structured results (duplicate identifiers, match probabilities, merge output JSON) — always encrypted before leaving your network.
Prerequisites
Before deploying a Local Agent you need:
- An Azure Storage account — used as the message bus between DeDuplica Cloud and your agent. The agent polls a queue in this storage account to receive job instructions. You provide the connection string during agent setup.
- Docker image with all configuration — issued by DeDuplica. Speak to our support to get all details you need to deploy the docker container.
- Docker (recommended) running on an Ubuntu/Debian Linux host.
- Outbound internet access from the agent host to Azure Storage endpoint. No inbound ports are required.
- Subscription Encryption Key — retrieved from the connection configuration page (see below). This key is required for the agent to encrypt data in transit.
Environment Variables
The Local Agent is configured entirely via environment variables. Set these in your Docker deployment (e.g. in docker-compose.yml or as -e flags).
| Variable | Required | Description |
|---|---|---|
AZURE_STORAGE_CONNECTION_STRING | Required | Connection string to your dedicated Azure Storage account. Used as the message bus between DeDuplica Cloud and the agent. You can use storage account in your Azure tenant or we can provide it for you. |
SUBSCRIPTION_ENCRYPTION_KEY | Required | Transit encryption key for all data leaving the agent. Retrieve this from the connection configuration page — see Retrieving Your Subscription Encryption Key below. |
CLIENT_ENCRYPTION_KEY | Optional | Your own AES encryption key. When set, record data receives an additional inner encryption layer that DeDuplica cannot read. See Client-Controlled Encryption below. THe downside is that user will not be able to see record data in DeDuplica portal - it can be decoded when webhook reaches your endpoint. This way you can keep full control over your data. |
Retrieving Your Subscription Encryption Key
The SUBSCRIPTION_ENCRYPTION_KEY is required for the Local Agent to function. You retrieve it self-serve from the DeDuplica connection configuration page — no support request needed.
Steps:
- Navigate to System → Connections and open your local agent connection.
- Ensure Local Agent is enabled on the connection.
- Scroll to the Azure Storage Connection String field. The Show Subscription Encryption Key button appears directly below it (visible only when Local Agent mode is enabled and you are on the Enterprise plan).
- Click Show Subscription Encryption Key. The key appears in an alert with a reminder to add it to your agent environment.
- Copy the key and set it as the
SUBSCRIPTION_ENCRYPTION_KEYenvironment variable in your agent configuration.
Keep this key confidential. It protects all data in transit between your agent and the DeDuplica processing queue. Do not commit it to version control.
Record Encryption
All data processed by the Local Agent is encrypted before it leaves your network. DeDuplica uses a two-layer encryption model:
Transit Encryption (SUBSCRIPTION_ENCRYPTION_KEY)
Every message placed on the Azure Storage queue by the agent is encrypted with the SUBSCRIPTION_ENCRYPTION_KEY. This layer:
- Is always applied — it is mandatory, not optional.
- Is stripped by DeDuplica’s backend when the message is received from the queue.
- Ensures data is never visible to the Azure Storage queue infrastructure itself.
Client-Controlled Encryption (CLIENT_ENCRYPTION_KEY)
For organisations where even DeDuplica must not be able to read matched record data, you can configure a CLIENT_ENCRYPTION_KEY. This applies an inner encryption layer before the transit layer:
- Record field data is encrypted with your
CLIENT_ENCRYPTION_KEY(inner layer). - The already-encrypted data is then encrypted again with the
SUBSCRIPTION_ENCRYPTION_KEY(outer transit layer). - DeDuplica’s backend strips only the outer transit layer — the inner layer remains intact.
- The still-encrypted merge output is stored in
MergeOutputJsonand delivered through to webhooks. - Only your receiving infrastructure (which holds
CLIENT_ENCRYPTION_KEY) can decrypt and read the actual field values.
This means DeDuplica itself never sees plaintext record data when CLIENT_ENCRYPTION_KEY is configured.
CLIENT_ENCRYPTION_KEY Format Requirements
CLIENT_ENCRYPTION_KEY must be a valid AES key:
- Must be a valid base64-encoded string.
- The decoded byte length must be exactly 16, 24, or 32 bytes (AES-128, AES-192, or AES-256 respectively).
- Any other decoded length is rejected at agent startup with a CRITICAL log entry.
Short strings such as
"test"are invalid even if they appear to be base64. The value"test"decodes to only 3 bytes — too short for AES.
Generating a valid 32-byte (AES-256) key:
openssl rand -base64 32Set the printed value as CLIENT_ENCRYPTION_KEY in your agent environment.
Encryption Behaviour Summary
| CLIENT_ENCRYPTION_KEY | What DeDuplica cloud can read | What your webhook receives |
|---|---|---|
| Not set | Plaintext merge output (after stripping transit layer) | Decrypted MergeOutputJson — ready to parse as JSON |
| Set (valid AES key) | Cannot read — inner layer is never stripped | Encrypted MergeOutputJson — must decrypt with your key before parsing |
Important — do not change your encryption keys after duplicates exist. Existing duplicate records store data encrypted with the key that was active when they were created. Changing or removing
CLIENT_ENCRYPTION_KEY(orSUBSCRIPTION_ENCRYPTION_KEY) will make those records permanently unreadable in webhooks and the DeDuplica UI. If you need to rotate a key, process and resolve all pending duplicates first, then rotate.
Docker configuration
The Local Agent runs as a Docker container. Configure it using Docker Compose and set the required environment variables in your environment or an .env file. Replace placeholder values before deploying.
version: '3.8'
services:
local-agent:
image: ghcr.io/deduplica/deduplica_agent:latest
environment:
AzureWebJobsStorage: ${AGENT_AZURE_STORAGE_CONNECTION_STRING}
AGENT_DEDUPLICATION_REQUEST_QUEUE_NAME: "agentqueue"
AGENT_DYNAMICS_ACTION_JOB_REPORT_QUEUE_NAME: "agentdynamicsactionjobreports"
AGENT_WEBHOOK_ACTION_JOB_REPORT_QUEUE_NAME: "agentwebhookactionjobreports"
AGENT_DUPLICATE_PAIR_REPORT_QUEUE_NAME: "agentduplicatepairreports"
AGENT_DUPLICATE_RECORD_DATA_QUEUE_NAME: "agentduplicaterecordsdata"
AGENT_JOB_EXEC_STATUS_UPDATE_QUEUE_NAME: "agentjobexecstatusupdates"
AGENT_ACTIONDUPLICATE_DYNAMICS_QUEUE_NAME: "agentqueueactiondynamics"
AGENT_ACTIONDUPLICATE_WEBHOOK_QUEUE_NAME: "agentqueueactionwebhook"
AGENT_DATABASE_QUERY_QUEUE_NAME: "agentqueuedatabasequery"
AGENT_DATABASE_QUERY_COMPLETED_QUEUE_NAME: "databasequerycompletedreports"
AGENT_LOGGING_QUEUE_NAME: "agentlogs"
AzureFunctionsJobHost__extensions__queues__batchSize: 1
AzureFunctionsJobHost__extensions__queues__newBatchThreshold: 0
FUNCTIONS_WORKER_PROCESS_COUNT: 1
DATAVERSE_URL: ""
DATAVERSE_CLIENT_ID: ""
DATAVERSE_CLIENT_SECRET: ""
DATAVERSE_TENANT_ID: ""
DATAVERSE_AUTHORITY: ""
JDBC_CONNECTIONSTRING: "jdbc:postgresql://postgreshost:5432/db" # sample JDBC connection string
JDBC_USERNAME: "testuser"
JDBC_PASSWORD: "testpassword"
SUBSCRIPTION_ENCRYPTION_KEY: "${AGENT_ENCRYPTION_KEY}"
CLIENT_ENCRYPTION_KEY: "" # optional: 32-byte base64 key; set to encrypt record data so DeDuplica cannot read it
restart: unless-stoppedDynamics / Dataverse (TDS) JDBC note
For Microsoft Dynamics 365 / Dataverse connections the JDBC connection string must target the TDS (SQL) endpoint. Example:
jdbc:sqlserver://orgname.crm4.dynamics.com:5558;databaseName=orgName;encrypt=false;trustServerCertificate=false;authentication=ActiveDirectoryServicePrincipalWhen connecting to Dynamics/Dataverse this way, set JDBC_USERNAME to your Azure AD application (client) ID and JDBC_PASSWORD to the application secret (the app registration’s client secret).
Deployment
DeDuplica support will guide you through the details of docker container setup. Because DeDuplica needs to relay results from your container, this also requires configuration on DeDuplica’s side to accept your Azure Storage queue for report messages. Allow a few days for both teams to coordinate the initial setup.
Assigning Connections to the Agent
When adding a database connection in DeDuplica, select the Local Agent as the connection setting. The agent must have network access to the database host and port — the connection is made from inside your network, just like any other internal application.
Supported connection types for local agent execution include all database connectors: PostgreSQL, MySQL, MariaDB, Microsoft SQL Server, Oracle Database, and Microsoft Dynamics 365 / Dataverse.
Restarting the Agent
DeDuplica cloud agents can be managed (restarted, etc.) directly in the panel. When you use a local agent, DeDuplica does not control your containers — restart operations may need to be handled by your local infrastructure administrator. You can only restart a bride which relies messages between DeDuplica cloud your your connected Azure Storage.
Webhooks With the Local Agent
Webhooks work normally with Local Agent deployments. DeDuplica fires a webhook for each duplicate cluster completed, regardless of where processing occurred. The payload structure is identical to cloud-processed duplicates.
If CLIENT_ENCRYPTION_KEY is configured, the MergeOutputJson field in the webhook payload will arrive encrypted. Your receiving system must decrypt it using the same CLIENT_ENCRYPTION_KEY before parsing the field values. See Webhooks Guide for the full payload reference.
Security Considerations
- The agent requires outbound-only connectivity. No inbound ports need to be opened.
- The Azure Storage account used for queue messaging should be dedicated to DeDuplica and access-controlled.
- All record data sent by the agent is encrypted in transit. With
CLIENT_ENCRYPTION_KEY, not even DeDuplica’s cloud can read your record data. - The agent runs as a non-privileged container.
Troubleshooting
Agent logs CRITICAL error about CLIENT_ENCRYPTION_KEY
The CLIENT_ENCRYPTION_KEY you have set is either not valid base64 or its decoded length is not 16, 24, or 32 bytes. Generate a new valid key and restart the agent:
openssl rand -base64 32Set the output as CLIENT_ENCRYPTION_KEY and restart the agent.
Jobs not starting or connection errors
Check the DeDuplica Logs panel first. For persistent issues, check the agent container logs:
docker logs deduplica-agentContact DeDuplica support if the issue persists.
Webhooks receive encrypted MergeOutputJson
If CLIENT_ENCRYPTION_KEY is configured on the agent, MergeOutputJson arrives encrypted in your webhook payload. Decrypt it using AES with your CLIENT_ENCRYPTION_KEY before parsing the JSON. The key format requirements are the same as described in Client-Controlled Encryption above.