Jobs

A Job is the primary unit of work in DeDuplica. Each job targets a single table in a connected data source, defines how duplicate records are identified, how they should be merged, and what action to take when a pair is found.

Job Sections

Each job is configured across the following sections:

Source Definition

Defines where data comes from: which table, which fields to include, and any filters to limit the dataset per run. Source entities and fields are fetched automatically from your connection.

Find Duplicates

Configures how records are compared to identify duplicates. You select fields to compare, choose a matching algorithm per field, set a rank (1–10) indicating field importance, and set an overall Strictness level (1–10). DeDuplica uses a probability-based engine to score each record pair.

Process Duplicates

Defines the merge output. You specify which record is the base (master) and which is subordinate, and set per-field merge rules (keep base value, keep subordinate, keep most recent, keep latest, etc.). DeDuplica produces a structured JSON with the merged result.

Action Duplicates

Determines what happens when a pair is confirmed as a duplicate:

  • Auto-merge — DeDuplica writes the merged record directly (Dynamics / Dataverse)
  • Webhook — the duplicate pair and merge JSON are POSTed to your endpoint for external processing
  • Pending — duplicates queue for manual review

Testing a Job

Before enabling full production runs, use the Test tab to run the job against a small subset. Inspect candidate pairs and generated merge output to verify rules behave as intended.

Job Lifecycle

  1. Configure the job (all sections above)
  2. Test against a limited dataset
  3. Enable scheduling or trigger a manual run
  4. Review results in Job Executions and the Duplicates section
  5. Adjust rules and repeat as needed

See Job Scheduling for information on scheduling, agents, and execution history.