How to start
This guide walks you through getting DeDuplica running from a fresh account to your first completed deduplication run.
Step 1 — Register and Choose a Plan
Go to app.deduplica.net and create an account. The Free plan is available immediately with no credit card required. It lets you connect a data source, create jobs, and run deduplication manually — a good way to evaluate before committing to a paid tier.
See Plans and Limitations for the full feature matrix.
Step 2 — Create a Connection
Navigate to System → Connections and click Add Connection.
Select your data source type and provide the required credentials:
| Source | What you need |
|---|---|
| Dynamics 365 / Dataverse | Tenant ID, Client ID, Client Secret, Environment URL |
| SQL Server | Server, Database, Username, Password (or Azure AD) |
| PostgreSQL | Host, Port, Database, Username, Password |
| MySQL / MariaDB | Host, Port, Database, Username, Password |
| Oracle | Host, Port, Service Name, Username, Password |
| Local Agent | Agent ID and licence key (Enterprise plan) |
See the Connections section for source-specific setup guides.
Once saved, DeDuplica will test the connection. A green status means it is ready to use.
Step 3 — Create a Job
Navigate to Jobs and click New Job.
Source Definition
Select the connection you just created. DeDuplica will fetch the available tables (or entities) from your source. Choose the table you want to deduplicate and select the fields to include. Add any filters to limit the dataset if needed.
Use the Validate Connection button (below the job description field) to send a quick test request and confirm the connection is reachable before proceeding. No data is modified and the test does not count against your plan limits. Important - for very large datasets you should consider splitting deduplication to separate jobs to avoid timeouts and overload. For example when you know your list of countries is closed dataset and you know you can’t have duplicates accross countries, you can create separate job per country. This way you will keep the dataset size reasonalble.
Find Duplicates
Add the fields you want to compare for duplicate detection. For each field:
- Choose a matching algorithm depending on your data type
- Set a rank from 1–10 indicating how strongly this field should influence the result
Set the overall Strictness (1–10). Start with 6–7 for a balanced result, increase to 8–10 for high-confidence automatic merges.
See Find Duplicates for a full guide to algorithms, rank, and strictness.
Process Duplicates
Define the merge output: which record is the base (master) and which is the subordinate, and what value should be kept for each field. DeDuplica pre-builds the merged JSON from these rules. You can use this output in your webhook listener later.
See Process Duplicates for details.
Action Duplicates
Choose what happens when duplicates are found:
- Auto-merge (Dynamics / Dataverse only) — DeDuplica applies the merge directly
- Webhook — the duplicate pair and merge JSON are POSTed to your endpoint
- Pending queue — duplicates wait for manual review
Test Before Production
Before running a job on your full dataset, use the Test tab to run against a small subset. Review the candidate pairs and the generated merge output to confirm your rules behave as expected. Adjust rank, strictness, or algorithms as needed.
See Testing a Job.
Step 5 — Run and Monitor
Run the job manually from the job detail page, or configure a schedule under the Scheduling tab.
After a run, navigate to Job Executions to view the run status, timing, and logs. Any found duplicates appear in the Duplicates section.
See Job Scheduling and Duplicates for details on managing ongoing operations.