Connection Limitations
This page explains how DeDuplica connects to Microsoft Dynamics 365 / Dataverse, the technical limitations of those connections, and what customers should be aware of when working with larger datasets. The goal is to help you choose the right connection method, understand platform constraints, and prepare your environment for successful and scalable deduplication.
This documentation is written for business users and administrators with limited technical background, while still including the necessary details for IT or platform administrators.
How DeDuplica Connects to Dataverse (Detailed Overview)
DeDuplica is designed to analyze large volumes of data efficiently while minimizing impact on your live Dataverse environment. To achieve this, DeDuplica uses SQL-based read access to Dataverse rather than relying solely on standard Dataverse APIs.
Behind the scenes, DeDuplica connects to Dataverse using the Tabular Data Stream (TDS) endpoint. TDS is a Microsoft-supported feature that exposes Dataverse tables as read-only SQL tables.
This approach allows DeDuplica to:
- Read data efficiently using SQL queries
- Analyze multiple attributes across many records
- Perform comparisons required for deduplication logic
- Reduce the number of API calls against Dataverse
Important: TDS access is strictly read-only. DeDuplica does not modify your data through TDS.
Mandatory Requirement: TDS Must Be Enabled
For DeDuplica to function, TDS must be enabled in your Dataverse environment.
By default, some Dataverse environments (especially older or locked-down environments) may have TDS disabled.
If TDS is not enabled:
- DeDuplica cannot read Dataverse tables
- Deduplication jobs will not start
- You may see connection or permission-related errors
Why Microsoft Requires Explicit Enablement
TDS exposes Dataverse data in a SQL-like format. Because this allows broader data access, Microsoft requires administrators to explicitly enable it.
This ensures:
- Only approved environments expose SQL access
- Security roles are respected
- Access remains auditable
How to Enable TDS
Enabling TDS is a one-time administrative action and does not impact users or applications negatively when properly configured.
📄 Step-by-step instructions:
📘 Microsoft reference documentation:
Security and Permissions Model
DeDuplica connects to Dataverse using an Azure Active Directory (AAD) application user or service principal.
What This Means for Customers
- DeDuplica only sees data that the application user is allowed to read
- Dataverse security roles are fully respected
- Field-level and table-level security apply
We recommend assigning:
- Read access to the required tables
- Read access to only the columns needed for deduplication
This ensures both security and optimal performance.
Why Standard Dataverse APIs Are Limited for Deduplication
Dataverse provides multiple APIs (Web API, SDK, service endpoints) that are ideal for transactional operations such as:
- Creating or updating records
- Integrating line-of-business applications
- Real-time user-driven processes
However, deduplication is a data-intensive analytical workload, which exposes several limitations when APIs are used.
Key API Constraints
1. Query Size and Paging Limits
- APIs limit the number of records returned per request
- Large datasets require extensive paging
- Paging significantly increases execution time
2. API Throttling and Call Limits
- Dataverse enforces per-user and per-application limits
- Application users have stricter thresholds
- High-volume reads can quickly exhaust daily quotas
3. Data Volume Restrictions
- APIs are not optimized for scanning entire tables
- Performance degrades as record counts grow
- Large deduplication jobs may fail mid-process
4. Execution Time Constraints
- Long-running API operations may be cancelled
- Partial results can lead to inconsistent outcomes
Because of these constraints, API-based access is not recommended for large-scale deduplication.
TDS-Specific Limitations (What Customers Should Know)
While TDS is significantly more efficient than APIs, it is still subject to platform constraints.
5-Minute Query Execution Timeout
- Each SQL query executed through TDS has a maximum execution time of approximately 5 minutes
- Queries exceeding this limit are automatically terminated
- This can occur with:
- Very large tables
- Complex matching logic
- Insufficient filtering
Performance Considerations
- Dataverse does not support custom indexing for TDS
- Query optimization options are limited
- Performance depends heavily on:
- Table size
- Column selection
- Filter conditions
When TDS Is a Good Fit
TDS works well when:
- Data volumes are small to medium
- Deduplication rules are well-scoped
- Jobs are segmented by business unit or date ranges
Best Practice for Large or Enterprise Environments: Dataverse Synapse Link
For large datasets or enterprise environments, DeDuplica strongly recommends using Dataverse Synapse Link.
What Is Dataverse Synapse Link?
Dataverse Synapse Link continuously exports Dataverse data into Azure Data Lake Storage, where it can be queried using a SQL endpoint backed by Azure Synapse.
This architecture is designed specifically for:
- Analytics
- Reporting
- Machine learning
- Large-scale data processing
Advantages for DeDuplica
Using Synapse Link enables:
- Processing of millions of records
- No API throttling or per-call limits
- No 5-minute execution timeout
- Improved performance and reliability
- Minimal impact on production Dataverse workloads
Microsoft Documentation
- Export Dataverse data to Azure Data Lake (Synapse Link): https://learn.microsoft.com/en-us/power-apps/maker/data-platform/export-to-data-lake
Note: Setting up Synapse Link typically requires Power Platform and Azure administrator involvement.
Using SQL Server Management Studio (SSMS) to Explore Your Data
DeDuplica encourages customers to explore their data before creating deduplication rules.
Why This Is Helpful
Using SQL Server Management Studio (SSMS) allows you to:
- Understand table structures and relationships
- Identify which fields contain meaningful data
- Detect data quality issues (empty fields, inconsistent formats)
- Design better deduplication rules
How SSMS Connects to Dataverse
- SSMS connects using the same TDS endpoint as DeDuplica
- Authentication uses your Microsoft Entra ID (Azure AD)
- Access is read-only
What You Can Do in SSMS
✔ Run SELECT queries
✔ Filter and sort records
✔ Join related tables
✖ Modify data ✖ Create or alter schema objects ✖ Bypass Dataverse security
Microsoft reference:
Choosing the Right Connection Strategy
| Scenario | Recommended Approach |
|---|---|
| Small to medium datasets | TDS (Default) |
| Large datasets (500k+ records) | Dataverse Synapse Link |
| Data exploration and analysis | SSMS via TDS |
| Enterprise-scale deduplication | Synapse SQL Endpoint |
Summary
- DeDuplica uses SQL-based access to Dataverse for efficient deduplication
- TDS must be enabled for DeDuplica to function
- Standard Dataverse APIs have strict limits and are not suitable for large deduplication workloads
- TDS has a 5-minute query timeout and is best for moderate datasets
- Dataverse Synapse Link provides the most scalable and reliable solution
If you are unsure which approach fits your environment, please contact the DeDuplica support team for guidance.