Connection Limitations

This page explains how DeDuplica connects to Microsoft Dynamics 365 / Dataverse, the technical limitations of those connections, and what customers should be aware of when working with larger datasets. The goal is to help you choose the right connection method, understand platform constraints, and prepare your environment for successful and scalable deduplication.

This documentation is written for business users and administrators with limited technical background, while still including the necessary details for IT or platform administrators.

How DeDuplica Connects to Dataverse (Detailed Overview)

DeDuplica is designed to analyze large volumes of data efficiently while minimizing impact on your live Dataverse environment. To achieve this, DeDuplica uses SQL-based read access to Dataverse rather than relying solely on standard Dataverse APIs.

Behind the scenes, DeDuplica connects to Dataverse using the Tabular Data Stream (TDS) endpoint. TDS is a Microsoft-supported feature that exposes Dataverse tables as read-only SQL tables.

This approach allows DeDuplica to:

Read data efficiently using SQL queries
Analyze multiple attributes across many records
Perform comparisons required for deduplication logic
Reduce the number of API calls against Dataverse

Important: TDS access is strictly read-only. DeDuplica does not modify your data through TDS.

Mandatory Requirement: TDS Must Be Enabled

For DeDuplica to function, TDS must be enabled in your Dataverse environment.

By default, some Dataverse environments (especially older or locked-down environments) may have TDS disabled.

If TDS is not enabled:

DeDuplica cannot read Dataverse tables
Deduplication jobs will not start
You may see connection or permission-related errors

Why Microsoft Requires Explicit Enablement

TDS exposes Dataverse data in a SQL-like format. Because this allows broader data access, Microsoft requires administrators to explicitly enable it.

This ensures:

Only approved environments expose SQL access
Security roles are respected
Access remains auditable

How to Enable TDS

Enabling TDS is a one-time administrative action and does not impact users or applications negatively when properly configured.

📄 Step-by-step instructions:

Enable TDS for Dataverse (.MD)

📘 Microsoft reference documentation:

https://learn.microsoft.com/en-us/power-apps/developer/data-platform/dataverse-sql-query

Security and Permissions Model

DeDuplica connects to Dataverse using an Azure Active Directory (AAD) application user or service principal.

What This Means for Customers

DeDuplica only sees data that the application user is allowed to read
Dataverse security roles are fully respected
Field-level and table-level security apply

We recommend assigning:

Read access to the required tables
Read access to only the columns needed for deduplication

This ensures both security and optimal performance.

Why Standard Dataverse APIs Are Limited for Deduplication

Dataverse provides multiple APIs (Web API, SDK, service endpoints) that are ideal for transactional operations such as:

Creating or updating records
Integrating line-of-business applications
Real-time user-driven processes

However, deduplication is a data-intensive analytical workload, which exposes several limitations when APIs are used.

Key API Constraints

1. Query Size and Paging Limits

APIs limit the number of records returned per request
Large datasets require extensive paging
Paging significantly increases execution time

2. API Throttling and Call Limits

Dataverse enforces per-user and per-application limits
Application users have stricter thresholds
High-volume reads can quickly exhaust daily quotas

3. Data Volume Restrictions

APIs are not optimized for scanning entire tables
Performance degrades as record counts grow
Large deduplication jobs may fail mid-process

4. Execution Time Constraints

Long-running API operations may be cancelled
Partial results can lead to inconsistent outcomes

Because of these constraints, API-based access is not recommended for large-scale deduplication.

TDS-Specific Limitations (What Customers Should Know)

While TDS is significantly more efficient than APIs, it is still subject to platform constraints.

5-Minute Query Execution Timeout

Each SQL query executed through TDS has a maximum execution time of approximately 5 minutes
Queries exceeding this limit are automatically terminated
This can occur with:
- Very large tables
- Complex matching logic
- Insufficient filtering

Performance Considerations

Dataverse does not support custom indexing for TDS
Query optimization options are limited
Performance depends heavily on:
- Table size
- Column selection
- Filter conditions

When TDS Is a Good Fit

TDS works well when:

Data volumes are small to medium
Deduplication rules are well-scoped
Jobs are segmented by business unit or date ranges

Best Practice for Large or Enterprise Environments: Dataverse Synapse Link

For large datasets or enterprise environments, DeDuplica strongly recommends using Dataverse Synapse Link.

What Is Dataverse Synapse Link?

Dataverse Synapse Link continuously exports Dataverse data into Azure Data Lake Storage, where it can be queried using a SQL endpoint backed by Azure Synapse.

This architecture is designed specifically for:

Analytics
Reporting
Machine learning
Large-scale data processing

Advantages for DeDuplica

Using Synapse Link enables:

Processing of millions of records
No API throttling or per-call limits
No 5-minute execution timeout
Improved performance and reliability
Minimal impact on production Dataverse workloads

Microsoft Documentation

Export Dataverse data to Azure Data Lake (Synapse Link): https://learn.microsoft.com/en-us/power-apps/maker/data-platform/export-to-data-lake

Note: Setting up Synapse Link typically requires Power Platform and Azure administrator involvement.

Using SQL Server Management Studio (SSMS) to Explore Your Data

DeDuplica encourages customers to explore their data before creating deduplication rules.

Why This Is Helpful

Using SQL Server Management Studio (SSMS) allows you to:

Understand table structures and relationships
Identify which fields contain meaningful data
Detect data quality issues (empty fields, inconsistent formats)
Design better deduplication rules

How SSMS Connects to Dataverse

SSMS connects using the same TDS endpoint as DeDuplica
Authentication uses your Microsoft Entra ID (Azure AD)
Access is read-only

What You Can Do in SSMS

✔ Run SELECT queries ✔ Filter and sort records ✔ Join related tables

✖ Modify data ✖ Create or alter schema objects ✖ Bypass Dataverse security

Microsoft reference:

https://learn.microsoft.com/en-us/power-apps/developer/data-platform/dataverse-sql-query

Choosing the Right Connection Strategy

Scenario	Recommended Approach
Small to medium datasets	TDS (Default)
Large datasets (500k+ records)	Dataverse Synapse Link
Data exploration and analysis	SSMS via TDS
Enterprise-scale deduplication	Synapse SQL Endpoint

Summary

DeDuplica uses SQL-based access to Dataverse for efficient deduplication
TDS must be enabled for DeDuplica to function
Standard Dataverse APIs have strict limits and are not suitable for large deduplication workloads
TDS has a 5-minute query timeout and is best for moderate datasets
Dataverse Synapse Link provides the most scalable and reliable solution

If you are unsure which approach fits your environment, please contact the DeDuplica support team for guidance.

Enable TDS for Dataverse