Source Definition
The Source Definition tab is the first section of a job configuration. It tells DeDuplica where to read data from, which records to include, and how to page through large datasets.
Validating the Connection
Below the job description field on the Get Data tab, there is a Validate Connection button. Click it to send a test request using the current connection settings — the result (success or error detail) is shown inline without saving the form.
Use this any time you want to confirm a connection is reachable before running a job, especially after:
- Creating a new connection
- Rotating credentials or a connection string
- Changing the Local Agent configuration
- Troubleshooting a failed job that may be a connectivity issue
The test does not modify any data and does not count against your plan’s processing limits.
Selecting a Table
After choosing the connection for the job, DeDuplica queries your data source and presents the available tables or entities. Select the one you want to deduplicate.
If a table is missing, check that the credentials in your connection have sufficient read permissions on that table.
Table Filter
Filters narrow the dataset before the job processes it. Use filters to:
- Limit deduplication to active or relevant records (e.g.,
Status = Active) - Exclude records created before a certain date
- Partition large tables by region, category, or owner to run focused jobs
Filters are applied at query time, reducing the number of records loaded.
Data groupping
For very large datasets you should consider splitting deduplication to separate jobs to avoid timeouts and overload. For example when you know your list of countries is closed dataset and you know you can’t have duplicates accross countries, you can create separate job per country. This way you will keep the dataset size reasonable.
Primary Key
DeDuplica requires a unique identifier field for each record. If the primary key is not detected automatically, specify it manually. This is used to track duplicate pairs and avoid re-raising the same pair in subsequent runs.
Important For Dataverse/Dynamics jobs this ID must be the table guid entity id. Otherwise job automatic merge will not work.
Notes
- The Source Definition is read-only relative to the connection — DeDuplica never modifies source data during the read phase.
- If the schema of your source changes (new fields, renamed columns), refresh the Source Definition to pick up the changes.
- For Dynamics / Dataverse, entities and attributes are fetched via the metadata API and reflect your organisation’s current schema.