Jobs
A Job is the primary unit of work in DeDuplica. Each job targets a single table in a connected data source, defines how duplicate records are identified, how they should be merged, and what action to take when a pair is found.
Job Sections
Each job is configured across the following sections:
Source Definition
Defines where data comes from: which table, which fields to include, and any filters to limit the dataset per run. Source entities and fields are fetched automatically from your connection.
Find Duplicates
Configures how records are compared to identify duplicates. You select fields to compare, choose a matching algorithm per field, set a rank (1–10) indicating field importance, and set an overall Strictness level (1–10). DeDuplica uses a probability-based engine to score each record pair.
Process Duplicates
Defines the merge output. You specify which record is the base (master) and which is subordinate, and set per-field merge rules (keep base value, keep subordinate, keep most recent, keep latest, etc.). DeDuplica produces a structured JSON with the merged result.
Action Duplicates
Determines what happens when a pair is confirmed as a duplicate:
- Auto-merge — DeDuplica writes the merged record directly (Dynamics / Dataverse)
- Webhook — the duplicate pair and merge JSON are POSTed to your endpoint for external processing
- Pending — duplicates queue for manual review
Testing a Job
Before enabling full production runs, use the Test tab to run the job against a small subset. Inspect candidate pairs and generated merge output to verify rules behave as intended.
Job Lifecycle
- Configure the job (all sections above)
- Test against a limited dataset
- Enable scheduling or trigger a manual run
- Review results in Job Executions and the Duplicates section
- Adjust rules and repeat as needed
See Job Scheduling for information on scheduling, agents, and execution history.