Find Duplicates

DeDuplica helps you identify duplicate records in your data using an intelligent, probability-based matching engine. While the underlying technology is advanced, configuring it is intentionally simple and intuitive.

This guide explains how field ranking and strictness work together, how to choose the right settings, and what to avoid.

🔍 How Duplicate Detection Works (Conceptually)

DeDuplica evaluates records by comparing selected fields and estimating how likely two records represent the same real-world entity.

Each comparison produces a match score between 0 and 1:

0 = definitely not the same
1 = definitely the same

A record pair is considered a duplicate if its score exceeds the threshold, which is determined automatically based on your settings.

You control this behavior using:

Field Rank (importance of each field)
Strictness (overall conservativeness of the matching)

🧩 Fields and Rank (1–10)

Each field you add can be assigned a rank from 1 to 10.

What Rank Means

Rank defines how strongly a field influences duplicate detection.

Rank	Meaning	Typical Use
1–2	Very weak signal	Country, language, currency
3–4	Weak signal	City, region
5–6	Medium signal	Postal code, company type
7–8	Strong signal	Street address, phone number
9–10	Decisive signal	Email, national ID, registration number

A Better Example (Realistic Scenario)

Imagine you are deduplicating customer records using:

Field	Rank
Email	9
Phone number	8
Last name	6
City	3

How DeDuplica interprets this:

Matching email almost guarantees the same person
Matching phone is strong but not absolute
Matching last name helps, but is common
Matching city alone means very little

Two records with the same city and last name will not be considered duplicates. Two records with the same email will almost certainly be considered duplicates.

This reflects how humans naturally reason about identity.

🎚 Strictness (1–10)

Strictness controls how conservative the algorithm is overall.

It answers the question:

“How sure should the system be before calling something a duplicate?”

Strictness Levels

Strictness	Behavior
1–2	Very lenient – finds many possible duplicates
3–4	Lenient – useful for exploration and review
5–6	Balanced – recommended starting point
7–8	Strict – high confidence matches
9–10	Very strict – near-certain duplicates only

What Strictness Actually Does

Strictness adjusts the decision threshold:

Low strictness → lower threshold → more matches
High strictness → higher threshold → fewer matches

With Strictness = 10, DeDuplica will only flag duplicates when strong fields agree convincingly.

🧠 How Rank and Strictness Work Together

Think of the process as two layers:

Field Rank decides what evidence matters
Strictness decides how much evidence is enough

Example: Business Deduplication

Field	Rank
Company name	7
VAT number	10
Country	2
Strictness	9

Results:

Same country → ignored
Similar company name → not enough
Same VAT number → duplicate
Same name + same country → likely duplicate

This mirrors real-world decision making.

✅ Good Practices

✔ Use High Ranks for Truly Identifying Fields

Good candidates for rank 7–10:

Email
Phone number
Legal identifiers
Registration numbers

✔ Use Low Ranks for Common or Expected Values

Good candidates for rank 1–4:

Country
Language
Status
Category

✔ Increase Strictness for Automation

If duplicates will be merged automatically, use:

Strictness 8–10

✔ Lower Strictness for Review Workflows

If duplicates are reviewed by humans, use:

Strictness 4–6

⚠️ What to Avoid

❌ Don’t Give Weak Fields High Rank

Example:

Country = rank 8
This can cause false positives because many records naturally share the same country.

❌ Don’t Set Every Field to Rank 10

If all fields are “decisive”, none of them truly are. Use high ranks sparingly.

❌ Don’t Use Low Strictness for High-Risk Actions

Lenient settings can surface many possible duplicates but should not be used for automatic merges.

🧪 Recommended Starting Templates

Person Deduplication

Email: rank 9–10
Phone: rank 8
Name: rank 6
Strictness: 6–8

Company Deduplication

Legal name: rank 7–8
Registration number: rank 9–10
Country: rank 2–3
Strictness: 8–10

Address Deduplication

Full address: rank 7–8
Postal code: rank 5
City: rank 3
Country: rank 2
Strictness: 7–9

✨ Why DeDuplica Is Powerful

Advanced probabilistic matching
Automatic threshold calibration
Flexible per-field importance
Safe defaults with expert-level control

All of this is delivered through simple sliders and dropdowns, so you can focus on your data—not algorithms.

Source Definition Process Duplicates